Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L
2011-06-02
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zemla, A; Lang, D; Kostova, T
2010-11-29
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert
2010-01-01
The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
How the Sequence of a Gene Specifies Structural Symmetry in Proteins
Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin
2015-01-01
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ecale Zhou, C L; Zemla, A T; Roe, D
2005-01-29
Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers, or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set ofmore » ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context, and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics, and vaccines.« less
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.
Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario
2011-01-01
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.
Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai
2015-12-01
The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Development of a Novel Technology for Label Free DNA Sequencing
2012-05-21
of the C-H bond stretch vibrations in the planes of the corresponding DNA bases , and in the higher-frequency side, sequence-identifier region is...composed of the N-H bond stretch vibrations in the planes of the corresponding DNA bases . In addition, the sequence-identifier dividing region almost...regions are localized at the corresponding DNA bases and exhibit a definable dependence on the sequence form of the codons under study. Final
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Iwamoto, Susumu; Tokumasu, Seiji; Suyama, Yoshihisa; Kakishima, Makoto
2005-01-01
We investigated intraspecific diversity and genetic structures of a saprotrophic fungus--Thysanophora penicillioides--based on sequences of nuclear ribosomal internal transcribed spacer (ITS) in 15 discontinuous Abies mariesii forests of Japan. In such a well-defined morphological species, numerous unexpected ITS variations were revealed: 12 ITS sequence types detected in 254 isolates collected from 15 local populations were classified into five ITS sequence groups. Maximally, four ITS groups consisted of seven ITS types coexisting in one population. However, group 1 was dominant with approximately 65%; in particular, one haplotype, 1a, was most dominant with approximately 60% in respective populations. Therefore, few differences were recognized in genetic structure among local populations, implying that the gene flow of each lineage of the fungus occurs among local populations without geographic limitations. However, minor haplotypes in some ITS groups were found only in restricted areas, suggesting that they might expand steadily from their places of origin to neighboring A. mariesii forests. Aggregating sequence data of seven European strains and four North American strains from various substrates to those of Japanese strains, 18 ITS sequence types and 28 variable sites were recognized. They were clustered into nine lineages by phylogenetic analyses of the beta-tubulin and combined ITS and beta-tubulin datasets. According to phylogenetic species recognition by the concordance of genealogies, respective lineages correspond to phylogenetic species. Plural phylogenetic species coexist in a local population in an A. mariesii forest in Japan.
R-chie: a web server and R package for visualizing RNA secondary structures
Lai, Daniel; Proctor, Jeff R.; Zhu, Jing Yun A.; Meyer, Irmtraud M.
2012-01-01
Visually examining RNA structures can greatly aid in understanding their potential functional roles and in evaluating the performance of structure prediction algorithms. As many functional roles of RNA structures can already be studied given the secondary structure of the RNA, various methods have been devised for visualizing RNA secondary structures. Most of these methods depict a given RNA secondary structure as a planar graph consisting of base-paired stems interconnected by roundish loops. In this article, we present an alternative method of depicting RNA secondary structure as arc diagrams. This is well suited for structures that are difficult or impossible to represent as planar stem-loop diagrams. Arc diagrams can intuitively display pseudo-knotted structures, as well as transient and alternative structural features. In addition, they facilitate the comparison of known and predicted RNA secondary structures. An added benefit is that structure information can be displayed in conjunction with a corresponding multiple sequence alignments, thereby highlighting structure and primary sequence conservation and variation. We have implemented the visualization algorithm as a web server R-chie as well as a corresponding R package called R4RNA, which allows users to run the software locally and across a range of common operating systems. PMID:22434875
Protein structure prediction with local adjust tabu search algorithm
2014-01-01
Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.
Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de
2006-03-31
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
You, Min Kyoung; Kim, Jin Hwa; Lee, Yeo Jin; Jeong, Ye Sol; Ha, Sun-Hwa
2016-12-22
Plastoglobules (PGs) are thylakoid membrane microdomains within plastids that are known as specialized locations of carotenogenesis. Three rice phytoene synthase proteins (OsPSYs) involved in carotenoid biosynthesis have been identified. Here, the N-terminal 80-amino-acid portion of OsPSY2 (PTp) was demonstrated to be a chloroplast-targeting peptide by displaying cytosolic localization of OsPSY2(ΔPTp):mCherry in rice protoplast, in contrast to chloroplast localization of OsPSY2:mCherry in a punctate pattern. The peptide sequence of a PTp was predicted to harbor two transmembrane domains eligible for a putative PG-targeting signal. To assess and enhance the PG-targeting ability of PTp, the original PTp DNA sequence ( PTp ) was modified to a synthetic DNA sequence ( stPTp ), which had 84.4% similarity to the original sequence. The motivation of this modification was to reduce the GC ratio from 75% to 65% and to disentangle the hairpin loop structures of PTp . These two DNA sequences were fused to the sequence of the synthetic green fluorescent protein (sGFP) and drove GFP expression with different efficiencies. In particular, the RNA and protein levels of stPTp-sGFP were slightly improved to 1.4-fold and 1.3-fold more than those of sGFP, respectively. The green fluorescent signals of their mature proteins were all observed as speckle-like patterns with slightly blurred stromal signals in chloroplasts. These discrete green speckles of PTp - sGFP and stPTp - sGFP corresponded exactly to the red fluorescent signal displayed by OsPSY2:mCherry in both etiolated and greening protoplasts and it is presumed to correspond to distinct PGs. In conclusion, we identified PTp as a transit peptide sequence facilitating preferential translocation of foreign proteins to PGs, and developed an improved PTp sequence, a s tPTp , which is expected to be very useful for applications in plant biotechnologies requiring precise micro-compartmental localization in plastids.
Rotondi, Kenneth S; Gierasch, Lila M
2003-07-08
The experiments described here explore the role of local sequence in the folding of cellular retinoic acid binding protein I (CRABP I). This is a 136-residue, 10-stranded, antiparallel beta-barrel protein with seven beta-hairpins and is a member of the intracellular lipid binding protein (iLBP) family. The relative roles of local and global sequence information in governing the folding of this class of proteins are not well-understood. In question is whether the beta-turns are locally defined by short-range interactions within their sequences, and are thus able to play an active role in reducing the conformational space available to the folding chain, or whether the turns are passive, relying upon global forces to form. Short (six- and seven-residue) peptides corresponding to the seven CRABP I turns were analyzed by circular dichroism and NMR for their tendencies to take up the conformations they adopt in the context of the native protein. The results indicate that two of the peptides, encompassing turns III and IV in CRABP I, have a strong intrinsic bias to form native turns. Intriguingly, these turns are on linked hairpins in CRABP I and represent the best-conserved turns in the iLBP family. These results suggest that local sequence may play an important role in narrowing the conformational ensemble of CRABP I during folding.
NMR studies on the structure and dynamics of lac operator DNA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, S.C.
Nuclear Magnetic Resonance spectroscopy was used to elucidate the relationships between structure, dynamics and function of the gene regulatory sequence corresponding to the lactose operon operator of Escherichia coli. The length of the DNA fragments examined varied from 13 to 36 base pair, containing all or part of the operator sequence. These DNA fragments are either derived genetically or synthesized chemically. Resonances of the imino protons were assigned by one dimensional inter-base pair nuclear Overhauser enhancement (NOE) measurements. Imino proton exchange rates were measured by saturation recovery methods. Results from the kinetic measurements show an interesting dynamic heterogeneity with amore » maximum opening rate centered about a GTG/CAC sequence which correlates with the biological function of the operator DNA. This particular three base pair sequence occurs frequently and often symmetrically in prokaryotic nd eukaryotic DNA sites where one anticipates specific protein interaction for gene regulation. The observed sequence dependent imino proton exchange rate may be a reflection of variation of the local structure of regulatory DNA. The results also indicate that the observed imino proton exchange rates are length dependent.« less
A computational proposal for designing structured RNA pools for in vitro selection of RNAs.
Kim, Namhee; Gan, Hin Hark; Schlick, Tamar
2007-04-01
Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
Michael, A.J.
1988-01-01
A three-dimensional velocity model for the area surrounding the 24 April 1984 Morgan Hill earthquake has been developed by simultaneously inverting local earthquake and refraction arrival-time data. This velocity model corresponds well to the surface geology of the region, predominantly showing a low-velocity region associated with the sedimentary sequence to the south-west of the Madrone Springs fault. The focal mechanisms were also determined for 946 earthquakes using both the one-dimensional and three-dimensional earth models. Both earth models yield similar focal mechanisms for these earthquakes. -from Author
NASA Astrophysics Data System (ADS)
Forte, Paulo M. F.; Felgueiras, P. E. R.; Ferreira, Flávio P.; Sousa, M. A.; Nunes-Pereira, Eduardo J.; Bret, Boris P. J.; Belsley, Michael S.
2017-01-01
An automatic optical inspection system for detecting local defects on specular surfaces is presented. The system uses an image display to produce a sequence of structured diffuse illumination patterns and a digital camera to acquire the corresponding sequence of images. An image enhancement algorithm, which measures the local intensity variations between bright- and dark-field illumination conditions, yields a final image in which the defects are revealed with a high contrast. Subsequently, an image segmentation algorithm, which compares statistically the enhanced image of the inspected surface with the corresponding image for a defect-free template, allows separating defects from non-defects with an adjusting decision threshold. The method can be applied to shiny surfaces of any material including metal, plastic and glass. The described method was tested on the plastic surface of a car dashboard system. We were able to detect not only scratches but also dust and fingerprints. In our experiment we observed a detection contrast increase from about 40%, when using an extended light source, to more than 90% when using a structured light source. The presented method is simple, robust and can be carried out with short cycle times, making it appropriate for applications in industrial environments.
Cianciulli, Antonia; Calvello, Rosa; Panaro, Maria A
2015-04-01
In the homologous genes studied, the exons and introns alternated in the same order in mouse and human. We studied, in both species: corresponding short segments of introns, whole corresponding introns and complete homologous genes. We considered the total number of nucleotides and the number and orientation of the SINE inserts. Comparisons of mouse and human data series showed that at the level of individual relatively short segments of intronic sequences the stochastic variability prevails in the local structuring, but at higher levels of organization a deterministic component emerges, conserved in mouse and human during the divergent evolution, despite the ample re-editing of the intronic sequences and the fact that processes such as SINE spread had taken place in an independent way in the two species. Intron conservation is negatively correlated with the SINE occupancy, suggesting that virus inserts interfere with the conservation of the sequences inherited from the common ancestor. Copyright © 2015 Elsevier Ltd. All rights reserved.
Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri
2003-08-01
One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.
Poltev, Valeri; Anisimov, Victor M; Danilov, Victor I; Garcia, Dolores; Sanchez, Carolina; Deriabina, Alexandra; Gonzalez, Eduardo; Rivas, Francisco; Polteva, Nina
2014-06-01
Our previous DFT computations of deoxydinucleoside monophosphate complexes with Na(+)-ions (dDMPs) have demonstrated that the main characteristics of Watson-Crick (WC) right-handed duplex families are predefined in the local energy minima of dDMPs. In this work, we study the mechanisms of contribution of chemically monotonous sugar-phosphate backbone and the bases into the double helix irregularity. Geometry optimization of sugar-phosphate backbone produces energy minima matching the WC DNA conformations. Studying the conformational variability of dDMPs in response to sequence permutation, we found that simple replacement of bases in the previously fully optimized dDMPs, e.g. by constructing Pyr-Pur from Pur-Pyr, and Pur-Pyr from Pyr-Pur sequences, while retaining the backbone geometry, automatically produces the mutual base position characteristic of the target sequence. Based on that, we infer that the directionality and the preferable regions of the sugar-phosphate torsions, combined with the difference of purines from pyrimidines in ring shape, determines the sequence dependence of the structure of WC DNA. No such sequence dependence exists in dDMPs corresponding to other DNA conformations (e.g., Z-family and Hoogsteen duplexes). Unlike other duplexes, WC helix is unique by its ability to match the local energy minima of the free single strand to the preferable conformations of the duplex. Copyright © 2013 Wiley Periodicals, Inc.
Alternative polyadenylation of the gene transcripts encoding a rat DNA polymerase beta.
Konopiński, R; Nowak, R; Siedlecki, J A
1996-10-17
Rat cells produce two different transcripts of DNA polymerase beta (beta-Pol). The low-molecular-weight transcript (1.4 kb) was already sequenced. We report here the cloning and sequencing of the full-length cDNA, corresponding to the high-molecular-weight (HMW) transcript (4.0 kb) of beta-Pol. Sequence data strongly suggest that both transcripts are produced from a single gene by alternative polyadenylation. The HMW transcript contains the entire 1.4 kb transcript sequence and additional 2.2 kb on the 3' end. The 3' UTR of the HMW transcript contains some regulatory sequences which are not present in the 1.4-kb transcript. The A + U-rich fragment and (GU)21 sequence are believed to influence the stability of the mRNA. The functional significance of the A-rich region locally destabilizing double-stranded secondary structure remains unknown.
Protein functional features are reflected in the patterns of mRNA translation speed.
López, Daniel; Pazos, Florencio
2015-07-09
The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
The identification and functional annotation of RNA structures conserved in vertebrates
Seemann, Stefan E.; Mirza, Aashiq H.; Hansen, Claus; Bang-Berthelsen, Claus H.; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T.; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan
2017-01-01
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. PMID:28487280
Structural mechanics of DNA wrapping in the nucleosome.
Battistini, Federica; Hunter, Christopher A; Gardiner, Eleanor J; Packer, Martin J
2010-02-19
Experimental X-ray crystal structures and a database of calculated structural parameters of DNA octamers were used in combination to analyse the mechanics of DNA bending in the nucleosome core complex. The 1kx5 X-ray crystal structure of the nucleosome core complex was used to determine the relationship between local structure at the base-step level and the global superhelical conformation observed for nucleosome-bound DNA. The superhelix is characterised by a large curvature (597 degrees) in one plane and very little curvature (10 degrees) in the orthogonal plane. Analysis of the curvature at the level of 10-step segments shows that there is a uniform curvature of 30 degrees per helical turn throughout most of the structure but that there are two sharper kinks of 50 degrees at +/-2 helical turns from the central dyad base pair. The curvature is due almost entirely to the base-step parameter roll. There are large periodic variations in roll, which are in phase with the helical twist and account for 500 degrees of the total curvature. Although variations in the other base-step parameters perturb the local path of the DNA, they make minimal contributions to the total curvature. This implies that DNA bending in the nucleosome is achieved using the roll-slide-twist degree of freedom previously identified as the major degree of freedom in naked DNA oligomers. The energetics of bending into a nucleosome-bound conformation were therefore analysed using a database of structural parameters that we have previously developed for naked DNA oligomers. The minimum energy roll, the roll flexibility force constant and the maximum and minimum accessible roll values were obtained for each base step in the relevant octanucleotide context to account for the effects of conformational coupling that vary with sequence context. The distribution of base-step roll values and corresponding strain energy required to bend DNA into the nucleosome-bound conformation defined by the 1kx5 structure were obtained by applying a constant bending moment. When a single bending moment was applied to the entire sequence, the local details of the calculated structure did not match the experiment. However, when local 10-step bending moments were applied separately, the calculated structure showed excellent agreement with experiment. This implies that the protein applies variable bending forces along the DNA to maintain the superhelical path required for nucleosome wrapping. In particular, the 50 degrees kinks are constraints imposed by the protein rather than a feature of the 1kx5 DNA sequence. The kinks coincide with a relatively flexible region of the sequence, and this is probably a prerequisite for high-affinity nucleosome binding, but the bending strain energy is significantly higher at these points than for the rest of the sequence. In the most rigid regions of the sequence, a higher strain energy is also required to achieve the standard 30 degrees curvature per helical turn. We conclude that matching of the DNA sequence to the local roll periodicity required to achieve bending, together with the increased flexibility required at the kinks, determines the sequence selectivity of DNA wrapping in the nucleosome. 2009 Elsevier Ltd. All rights reserved.
Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri
2003-01-01
One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Widespread signatures of local mRNA folding structure selection in four Dengue virus serotypes
2015-01-01
Background It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti- Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates. PMID:26449467
Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma
2013-01-04
One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.
The identification and functional annotation of RNA structures conserved in vertebrates.
Seemann, Stefan E; Mirza, Aashiq H; Hansen, Claus; Bang-Berthelsen, Claus H; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L; Gorodkin, Jan
2017-08-01
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. © 2017 Seemann et al.; Published by Cold Spring Harbor Laboratory Press.
Turner, Trudy R.; Coetzer, Willem G.; Schmitt, Christopher A.; Lorenz, Joseph G.; Freimer, Nelson B.; Grobler, J. Paul
2015-01-01
Objectives Vervet monkeys are common in most tree-rich areas of South Africa, but their absence from grassland and semi-desert areas of the country suggest potentially restricted and mosaic local population patterns that may have relevance to local phenotype patterns and selection. A portion of the mtDNA control region was sequenced to study patterns of genetic differentiation. Materials and Methods DNA was extracted and mtDNA sequences were obtained from 101 vervet monkeys at 15 localities which represent both an extensive (widely across the distribution range) and intensive (more than one troop at most of the localities) sampling strategy. Analyses utilized Arlequin 3.1, MEGA 6, BEAST v1.5.2 and Network V3.6.1 Results The dataset contained 26 distinct haplotypes, with six populations fixed for single haplotypes. Pairwise P-distance among population pairs showed significant differentiation among most population pairs, but with non-significant differences among populations within some regions. Populations were grouped into three broad clusters in a maximum likelihood phylogenetic tree and a haplotype network. These clusters correspond to (i) north-western, northern and north-eastern parts of the distribution range as well as the northern coastal belt; (ii) central areas of the country; and (iii) southern part of the Indian Ocean coastal belt, and adjacent inland areas. Discussion Apparent patterns of genetic structure correspond to current and past distribution of suitable habitat, geographic barriers to gene flow, geographic distance and female philopatry. However, further work on nuclear markers and other genomic data is necessary to confirm these results. PMID:26265297
Xu, Tingting; Zhou, Cong-Zhao; Xiao, Jianxi; Liu, Jinsong
2018-02-20
Naturally occurring interruptions in nonfibrillar collagen play key roles in molecular flexibility, collagen degradation, and ligand binding. The structural feature of the interruption sequences and the molecular basis for their functions have not been well studied. Here, we focused on a G5G type natural interruption sequence G-POALO-G from human type XIX collagen, a homotrimer collagen, as this sequence possesses distinct properties compared with those of a pathological similar Gly mutation sequence in collagen mimic peptides. We determined the crystal structures of the host-guest peptide (GPO) 3 -GPOALO-(GPO) 4 to 1.03 Å resolution in two crystal forms. In these structures, the interruption zone brings localized disruptions to the triple helix and introduces a light 6-8° bend with the same directional preference to the whole molecule, which may correspond structurally to the first physiological kink site in type XIX collagen. Furthermore, at the G5G interruption site, the presence of Ala and Leu residues, both with free N-H groups, allows the formation of more direct and water-mediated interchain hydrogen bonds than in the related Gly → Ala structure. These could partly explain the difference in thermal stability between the different interruptions. In addition, our structures provide a detailed view of the dynamic property of such an interrupted zone with respect to hydrogen bonding topology, torsion angles, and helical parameters. Our results, for the first time, also identified the binding of zinc to the end of the triple helix. These findings will shed light on how the interruption sequence influences the conformation of the collagen molecule and provide a structural basis for further functional studies.
A space-efficient algorithm for local similarities.
Huang, X Q; Hardison, R C; Miller, W
1990-10-01
Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Kaushik, Mahima; Kukreti, Shrikant
2015-01-01
Our previous work on structural polymorphism shown at a single nucleotide polymorphism (SNP) (A → G) site located on HS4 region of locus control region (LCR) of β-globin gene has established a hairpin → duplex equilibrium corresponding to A → B like DNA transition (Kaushik M, Kukreti, R., Grover, D., Brahmachari, S.K. and Kukreti S. Nucleic Acids Res. 2003; Kaushik M, Kukreti S. Nucleic Acids Res. 2006). The G-allele of A → G SNP has been shown to be significantly associated with the occurrence of β-thalassemia. Considering the significance of this 11-nt long quasi-palindromic sequence [5'-TGGGG(G/A)CCCCA; HP(G/A)11] of β-globin gene LCR, we further explored the differential behavior of the same DNA sequence with its RNA counterpart, using various biophysical and biochemical techniques. In contrast to its DNA counterpart exhibiting a A → B structural transition and an equilibrium between duplex and hairpin forms, the studied RNA oligonucleotide sequence [5'-UGGGG(G/A)CCCCA; RHP(G/A)11] existed only in duplex form (A-conformation) and did not form hairpin. The single residue difference from A to G led to the unusual thermal stability of the RNA structure formed by the studied sequence. Since, naturally occurring mutations and various SNP sites may stabilize or destabilize the local DNA/RNA secondary structures, these structural transitions may affect the gene expression by a change in the protein-DNA recognition patterns.
Nodal domains of a non-separable problem—the right-angled isosceles triangle
NASA Astrophysics Data System (ADS)
Aronovitch, Amit; Band, Ram; Fajman, David; Gnutzmann, Sven
2012-03-01
We study the nodal set of eigenfunctions of the Laplace operator on the right-angled isosceles triangle. A local analysis of the nodal pattern provides an algorithm for computing the number νn of nodal domains for any eigenfunction. In addition, an exact recursive formula for the number of nodal domains is found to reproduce all existing data. Eventually, we use the recursion formula to analyse a large sequence of nodal counts statistically. Our analysis shows that the distribution of nodal counts for this triangular shape has a much richer structure than the known cases of regular separable shapes or completely irregular shapes. Furthermore, we demonstrate that the nodal count sequence contains information about the periodic orbits of the corresponding classical ray dynamics.
NASA Astrophysics Data System (ADS)
Houdayer, Cyril; Isono, Yusuke
2016-12-01
We investigate the asymptotic structure of (possibly type III) crossed product von Neumann algebras {M = B rtimes Γ} arising from arbitrary actions {Γ \\curvearrowright B} of bi-exact discrete groups (e.g. free groups) on amenable von Neumann algebras. We prove a spectral gap rigidity result for the central sequence algebra {N' \\cap M^ω} of any nonamenable von Neumann subalgebra with normal expectation {N subset M}. We use this result to show that for any strongly ergodic essentially free nonsingular action {Γ \\curvearrowright (X, μ)} of any bi-exact countable discrete group on a standard probability space, the corresponding group measure space factor {L^∞(X) rtimes Γ} has no nontrivial central sequence. Using recent results of Boutonnet et al. (Local spectral gap in simple Lie groups and applications, 2015), we construct, for every {0 < λ ≤ 1}, a type {III_λ} strongly ergodic essentially free nonsingular action {F_∞ \\curvearrowright (X_λ, μ_λ)} of the free group {{F}_∞} on a standard probability space so that the corresponding group measure space type {III_λ} factor {L^∞(X_λ, μ_λ) rtimes F_∞} has no nontrivial central sequence by our main result. In particular, we obtain the first examples of group measure space type {III} factors with no nontrivial central sequence.
Cirulli, Elizabeth T; Noor, Mohamed A F
2007-01-01
Ectopic exchange between transposable elements or other repetitive sequences along a chromosome can produce chromosomal inversions. As a result, genome sequence studies typically find sequence similarity between corresponding inversion breakpoint regions. Here, we identify and investigate the breakpoint regions of the X chromosome inversion distinguishing Drosophila mojavensis and Drosophila arizonae. We localize one inversion breakpoint to 13.7 kb and localize the other to a 1-Mb interval. Using this localization and assuming microsynteny between Drosophila melanogaster and D. arizonae, we pinpoint likely positions of the inversion breakpoints to windows of less than 3000 bp. These breakpoints define the size of the inversion to approximately 11 Mb. However, in contrast to many other studies, we fail to find significant sequence similarity between the 2 breakpoint regions. The localization of these inversion breakpoints will facilitate future genetic and molecular evolutionary studies in this species group, an emerging model system for ecological genetics.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.
Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.
1995-01-01
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Structural alphabets derived from attractors in conformational space
2010-01-01
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics. PMID:20170534
Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme
2013-07-01
The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
Random digital encryption secure communication system
NASA Technical Reports Server (NTRS)
Doland, G. D. (Inventor)
1982-01-01
The design of a secure communication system is described. A product code, formed from two pseudorandom sequences of digital bits, is used to encipher or scramble data prior to transmission. The two pseudorandom sequences are periodically changed at intervals before they have had time to repeat. One of the two sequences is transmitted continuously with the scrambled data for synchronization. In the receiver portion of the system, the incoming signal is compared with one of two locally generated pseudorandom sequences until correspondence between the sequences is obtained. At this time, the two locally generated sequences are formed into a product code which deciphers the data from the incoming signal. Provision is made to ensure synchronization of the transmitting and receiving portions of the system.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Dubiel, Russell F.
1983-01-01
Closely spaced measured stratigraphic sections of the lower part of the Late Triassic Chinle Formation in the White Canyon area of southeastern Utah depict a fluvial-deltaic-lacustrine depositional sequence that hosts uranium deposits in basal fluvial sandstones. The basal Shinarump Member consists of predominantly trough-crossbedded, coarse-grained sandstone and minor gray, carbonaceous mudstone and is interpreted as a valley-fill sequence overlain by deposits of a braided stream system. The overlying Monitor Butte Member is composed of cyclic- and foreset-bedded siltstone, sandstone, and mudstone and is interpreted as a succession of low-energy fluvial, deltaic and orqanicrich, lacustrine-marsh sediments. The overlying Moss Back Member is composed of a laterally extensive, coarse- to medium-grained, conglomeratic sandstone and is interpreted as a braided-stream system that flowed north to northwest. The entire sequence was deposited in response to changes in local base level associated with a large lake that lay to the west. Isopachs of lithofacies indicate distinct lacustrine basins and a correspondence between these facies and modern structural synclines. Facies changes and coincidence of isopach thicks suggest that structural synclines were active in the Late Triassic and influenced the pattern of sediment distribution within the basins. Uranium mineralization appears to be related to certain low-energy depositional environments in that uranium is localized in fluvial sandstones that lie beneath organic-rich lacustrine-marsh mudstones and carbonaceous delta-front sediments. The reducing environment preserved in these facies may have played an important role in the localization of uranium.
Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna
2005-01-28
Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.
Breaking the acoustic diffraction barrier with localization optoacoustic tomography
NASA Astrophysics Data System (ADS)
Deán-Ben, X. Luís.; Razansky, Daniel
2018-02-01
Diffraction causes blurring of high-resolution features in images and has been traditionally associated to the resolution limit in light microscopy and other imaging modalities. The resolution of an imaging system can be generally assessed via its point spread function, corresponding to the image acquired from a point source. However, the precision in determining the position of an isolated source can greatly exceed the diffraction limit. By combining the estimated positions of multiple sources, localization-based imaging has resulted in groundbreaking methods such as super-resolution fluorescence optical microscopy and has also enabled ultrasound imaging of microvascular structures with unprecedented spatial resolution in deep tissues. Herein, we introduce localization optoacoustic tomography (LOT) and discuss on the prospects of using localization imaging principles in optoacoustic imaging. LOT was experimentally implemented by real-time imaging of flowing particles in 3D with a recently-developed volumetric optoacoustic tomography system. Provided the particles were separated by a distance larger than the diffraction-limited resolution, their individual locations could be accurately determined in each frame of the acquired image sequence and the localization image was formed by superimposing a set of points corresponding to the localized positions of the absorbers. The presented results demonstrate that LOT can significantly enhance the well-established advantages of optoacoustic imaging by breaking the acoustic diffraction barrier in deep tissues and mitigating artifacts due to limited-view tomographic acquisitions.
Local Renyi entropic profiles of DNA sequences.
Vinga, Susana; Almeida, Jonas S
2007-10-16
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Local Renyi entropic profiles of DNA sequences
Vinga, Susana; Almeida, Jonas S
2007-01-01
Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871
Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang
2012-05-01
The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
Morphometry Based on Effective and Accurate Correspondences of Localized Patterns (MEACOLP)
Wang, Hu; Ren, Yanshuang; Bai, Lijun; Zhang, Wensheng; Tian, Jie
2012-01-01
Local features in volumetric images have been used to identify correspondences of localized anatomical structures for brain morphometry. However, the correspondences are often sparse thus ineffective in reflecting the underlying structures, making it unreliable to evaluate specific morphological differences. This paper presents a morphometry method (MEACOLP) based on correspondences with improved effectiveness and accuracy. A novel two-level scale-invariant feature transform is used to enhance the detection repeatability of local features and to recall the correspondences that might be missed in previous studies. Template patterns whose correspondences could be commonly identified in each group are constructed to serve as the basis for morphometric analysis. A matching algorithm is developed to reduce the identification errors by comparing neighboring local features and rejecting unreliable matches. The two-sample t-test is finally adopted to analyze specific properties of the template patterns. Experiments are performed on the public OASIS database to clinically analyze brain images of Alzheimer's disease (AD) and normal controls (NC). MEACOLP automatically identifies known morphological differences between AD and NC brains, and characterizes the differences well as the scaling and translation of underlying structures. Most of the significant differences are identified in only a single hemisphere, indicating that AD-related structures are characterized by strong anatomical asymmetry. In addition, classification trials to differentiate AD subjects from NC confirm that the morphological differences are reliably related to the groups of interest. PMID:22540000
Song, Jiangning; Wang, Minglei; Burrage, Kevin
2006-07-21
High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.
The effects of DNA supercoiling on G-quadruplex formation.
Sekibo, Doreen A T; Fox, Keith R
2017-12-01
Guanine-rich DNAs can fold into four-stranded structures that contain stacks of G-quartets. Bioinformatics studies have revealed that G-rich sequences with the potential to adopt these structures are unevenly distributed throughout genomes, and are especially found in gene promoter regions. With the exception of the single-stranded telomeric DNA, all genomic G-rich sequences will always be present along with their C-rich complements, and quadruplex formation will be in competition with the corresponding Watson-Crick duplex. Quadruplex formation must therefore first require local dissociation (melting) of the duplex strands. Since negative supercoiling is known to facilitate the formation of alternative DNA structures, we have investigated G-quadruplex formation within negatively supercoiled DNA plasmids. Plasmids containing multiple copies of (G3T)n and (G3T4)n repeats, were probed with dimethylsulphate, potassium permanganate and S1 nuclease. While dimethylsulphate footprinting revealed some evidence for G-quadruplex formation in (G3T)n sequences, this was not affected by supercoiling, and permanganate failed to detect exposed thymines in the loop regions. (G3T4)n sequences were not protected from DMS and showed no reaction with permanganate. Similarly, both S1 nuclease and 2D gel electrophoresis of DNA topoisomers did not detect any supercoil-dependent structural transitions. These results suggest that negative supercoiling alone is not sufficient to drive G-quadruplex formation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
A structurally driven analysis of thiol reactivity in mammalian albumins.
Spiga, Ottavia; Summa, Domenico; Cirri, Simone; Bernini, Andrea; Venditti, Vincenzo; De Chiara, Matteo; Priora, Raffaella; Frosali, Simona; Margaritis, Antonios; Di Giuseppe, Danila; Di Simplicio, Paolo; Niccolai, Neri
2011-04-01
Understanding the structural basis of protein redox activity is still an open question. Hence, by using a structural genomics approach, different albumins have been chosen to correlate protein structural features with the corresponding reaction rates of thiol exchange between albumin and disulfide DTNB. Predicted structures of rat, porcine, and bovine albumins have been compared with the experimentally derived human albumin. High structural similarity among these four albumins can be observed, in spite of their markedly different reactivity with DTNB. Sequence alignments offered preliminary hints on the contributions of sequence-specific local environments modulating albumin reactivity. Molecular dynamics simulations performed on experimental and predicted albumin structures reveal that thiolation rates are influenced by hydrogen bonding pattern and stability of the acceptor C34 sulphur atom with donor groups of nearby residues. Atom depth evolution of albumin C34 thiol groups has been monitored during Molecular Dynamic trajectories. The most reactive albumins appeared also the ones presenting the C34 sulphur atom on the protein surface with the highest accessibility. High C34 sulphur atom reactivity in rat and porcine albumins seems to be determined by the presence of additional positively charged amino acid residues favoring both the C34 S⁻ form and the approach of DTNB. Copyright © 2011 Wiley Periodicals, Inc.
Robust feature tracking for endoscopic pose estimation and structure recovery
NASA Astrophysics Data System (ADS)
Speidel, S.; Krappe, S.; Röhl, S.; Bodenstedt, S.; Müller-Stich, B.; Dillmann, R.
2013-03-01
Minimally invasive surgery is a highly complex medical discipline with several difficulties for the surgeon. To alleviate these difficulties, augmented reality can be used for intraoperative assistance. For visualization, the endoscope pose must be known which can be acquired with a SLAM (Simultaneous Localization and Mapping) approach using the endoscopic images. In this paper we focus on feature tracking for SLAM in minimally invasive surgery. Robust feature tracking and minimization of false correspondences is crucial for localizing the endoscope. As sensory input we use a stereo endoscope and evaluate different feature types in a developed SLAM framework. The accuracy of the endoscope pose estimation is validated with synthetic and ex vivo data. Furthermore we test the approach with in vivo image sequences from da Vinci interventions.
Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij
2012-03-01
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.
Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures
Bechtel, Jason M; Wittenschlaeger, Thomas; Dwyer, Trisha; Song, Jun; Arunachalam, Sasi; Ramakrishnan, Sadeesh K; Shepard, Samuel; Fedorov, Alexei
2008-01-01
Background Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression. Results We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena. Conclusion We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI. PMID:18549495
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu
2011-09-07
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu
2011-01-01
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
[A turning point in the knowledge of the structure-function-activity relations of elastin].
Alix, A J
2001-01-01
In this review are presented the last new results of our research group dealing with the molecular structures (atomic level) of tropoelastin, elastin and elastin derived peptides studied by using essentially methods of bioinformatics (theoretical predictions and molecular modelling) linked to experimental circular dichroism spectroscopic studies. We already had characterized both the local secondary structure and some parts of the tertiary structure of the tropoelastin and elastin molecules (human, bovine...), by using either theoretical predictions (local secondary structure, linear epitopes...) and/or experimental data (optical spectroscopic methods: Raman scattering, infrared absorption, circular dichroism). Except the cross-linking regions which are in helical conformations, the whole tropoelastin structure displays a lot of beta-reverse turns which usually belong to irregular structures in proteins. These turns play a key role in other regularly structures orientation (alpha-helix, beta-strand), thus they are very important in the native protein 3D architecture. It is particularly true for human tropoelastin, because its sequence is rich in glycines and prolines, and these residues are frequently met in beta-turns (a beta-turn is made of four consecutive residues which are stabilized by an hydrogen bond). Several types of beta-turns can be defined with the dihedral angles values phi and psi of the two central residues. Thus, by using a very recent updated set of propensities for the amino acid residues to belong to given types of reverse beta-turns (extracted from a reference set of known 3-D structures of globular proteins), we have determined, (by using our home made software COUDES), for all possible tetrapeptides of the human tropoelastin sequence, the distribution and the characterization of the possible type of turns. Thus, it is shown that the locations and/or the types of these reverse beta-turns reveal a regularity and are not all random. This confirms our hypothesis that intra-molecular elasticity of tropoelastin could be explained by the possibility of transitions between conformations involving short beta-strands and beta-turns. This result is of great interest in the construction (by using molecular biology) of elastic biomaterials derived from the elastin sequence (particularly, the elastin derived peptides corresponding to the sequence exon 21--(exon 24--exon 24...). Our study permit also to predict the conformations of specific elastin derived peptides which could have interesting biological activity. Peptides resulting from the degradation of elastin, the insoluble polymer of tropoelastin and responsible for the elasticity of vertebrate tissues, can induce biological effects and notably the regulation of matrix metalloproteinases (MMP-s) activity. Recently, it was proposed that some elastin derived hexapeptides resulting from circular permutations of VGVAPG (a three fold repetition sequence in exon 24 of human tropoelastin) possess MMP-1 production and activation regulation properties. This effect depends on the presence of the tropoelastin specific membraneous receptor 67 KDa EBP (Elastin Binding Protein). Our results obtained by using both circular dichroism spectroscopy and linear predictions confirmed the hypothesis of a structure dependent mechanism with a possibly occurring type VIII beta-turn on the first four residues of the GXXPG sequence consensus which is only present among all active peptides. Thus, we have performed extensive molecular dynamics studies, in both implicit and explicit solvent, on these active and inactive elastin derived hexapeptides. Using our own analysis method of pattern recognition of the types of the beta-reverse-turns followed during the molecular dynamics trajectory, we found that active and inactive peptides effectively form two well distinct conformational groups in which active peptides preferentially adopt conformation close to type VIII GXXP (beta-reverse-turn. The structural role of the C terminal G residue could also be explained. Additional molecular simulations on (VGVAPG)2 and (VGVAPG)3 show the formation of two or three GXXP tetrapeptides adopting a structure close to type VIII beta-reverse-turn, suggesting a local conformational preference for this motif. This observation of a specific structural single and/or repeated motif is in agreement with the circular dichroism spectra of the involved (VGVAPG)1, (VGVAPG)2 and (VGVAPG)3 peptides and then it can be proposed that their biological activities have to be linear. The final aim of this type of work is to understand more about the sequence/structure/function/activity relationships of those structured peptides in order to propose specific sequences (corresponding to specific structures) for best biological activity results.
A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences
Zhu, Youding; Fujimura, Kikuo
2010-01-01
This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach. PMID:22399933
Physical Model of the Genotype-to-Phenotype Map of Proteins
NASA Astrophysics Data System (ADS)
Tlusty, Tsvi; Libchaber, Albert; Eckmann, Jean-Pierre
2017-04-01
How DNA is mapped to functional proteins is a basic question of living matter. We introduce and study a physical model of protein evolution which suggests a mechanical basis for this map. Many proteins rely on large-scale motion to function. We therefore treat protein as learning amorphous matter that evolves towards such a mechanical function: Genes are binary sequences that encode the connectivity of the amino acid network that makes a protein. The gene is evolved until the network forms a shear band across the protein, which allows for long-range, soft modes required for protein function. The evolution reduces the high-dimensional sequence space to a low-dimensional space of mechanical modes, in accord with the observed dimensional reduction between genotype and phenotype of proteins. Spectral analysis of the space of 1 06 solutions shows a strong correspondence between localization around the shear band of both mechanical modes and the sequence structure. Specifically, our model shows how mutations are correlated among amino acids whose interactions determine the functional mode.
Chang, Jian-Cheng; Ponnath, Daniel W; Ramasamy, Srinivasan
2016-01-01
Leucinodes orbonalis is the most detrimental South and Southeast Asian insect pest of eggplant. To help reduce the impact of this pest, population genetic diversity and structure of L. orbonalis were examined in eight populations from six countries using mitochondrial cytochrome c oxidase subunit I DNA sequences. No correlation between genetic diversity and geographic distance was detected among populations. Low levels of haplotype and nucleotide diversities were observed in the Philippines population, suggesting recent colonization. No significant gene flow was found among local populations in different countries. The Vietnam population is highly differentiated, indicated by significant pairwise FST values, and may be ascribed to a new subspecies or race. India was confirmed to be the source of genetic variation in L. orbonalis populations. Our study showed that L. orbonalis formed subpopulations for each local region, and the corresponding pest management technology should be developed at the country scale.
Spin-reorientation transitions in the Cairo pentagonal magnet Bi 4 Fe 5 O 13 F
Tsirlin, Alexander A.; Rousochatzakis, Ioannis; Filimonov, Dmitry; ...
2017-09-19
Here, we show that interlayer spins play a dual role in the Cairo pentagonal magnet Bi 4Fe 5O 13F, on one hand mediating the three-dimensional magnetic order, and on the other driving spin-reorientation transitions both within and between the planes. The corresponding sequence of magnetic orders unraveled by neutron diffraction and Mössbauer spectroscopy features two orthogonal magnetic structures described by opposite local vector chiralities, and an intermediate, partly disordered phase with nearly collinear spins. A similar collinear phase has been predicted theoretically to be stabilized by quantum fluctuations, but Bi 4Fe 5O 13F is very far from the relevant parametermore » regime. While the observed in-plane reorientation cannot be explained by any standard frustration mechanism, our ab initio band-structure calculations reveal strong single-ion anisotropy of the interlayer Fe 3+ spins that turns out to be instrumental in controlling the local vector chirality and the associated interlayer order.« less
Spin-reorientation transitions in the Cairo pentagonal magnet Bi4Fe5O13F
NASA Astrophysics Data System (ADS)
Tsirlin, Alexander A.; Rousochatzakis, Ioannis; Filimonov, Dmitry; Batuk, Dmitry; Frontzek, Matthias; Abakumov, Artem M.
2017-09-01
We show that interlayer spins play a dual role in the Cairo pentagonal magnet Bi4Fe5O13F , on one hand mediating the three-dimensional magnetic order, and on the other driving spin-reorientation transitions both within and between the planes. The corresponding sequence of magnetic orders unraveled by neutron diffraction and Mössbauer spectroscopy features two orthogonal magnetic structures described by opposite local vector chiralities, and an intermediate, partly disordered phase with nearly collinear spins. A similar collinear phase has been predicted theoretically to be stabilized by quantum fluctuations, but Bi4Fe5O13F is very far from the relevant parameter regime. While the observed in-plane reorientation cannot be explained by any standard frustration mechanism, our ab initio band-structure calculations reveal strong single-ion anisotropy of the interlayer Fe3 + spins that turns out to be instrumental in controlling the local vector chirality and the associated interlayer order.
Local backbone structure prediction of proteins
De Brevern, Alexandre G.; Benros, Cristina; Gautier, Romain; Valadié, Hélène; Hazout, Serge; Etchebest, Catherine
2004-01-01
Summary A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (φ, Ψ) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288
Fang, Jing; Nevin, Philip; Kairys, Visvaldas; Venclovas, Česlovas; Engen, John R; Beuning, Penny J
2014-04-08
The relationship between protein sequence, structure, and dynamics has been elusive. Here, we report a comprehensive analysis using an in-solution experimental approach to study how the conservation of tertiary structure correlates with protein dynamics. Hydrogen exchange measurements of eight processivity clamp proteins from different species revealed that, despite highly similar three-dimensional structures, clamp proteins display a wide range of dynamic behavior. Differences were apparent both for structurally similar domains within proteins and for corresponding domains of different proteins. Several of the clamps contained regions that underwent local unfolding with different half-lives. We also observed a conserved pattern of alternating dynamics of the α helices lining the inner pore of the clamps as well as a correlation between dynamics and the number of salt bridges in these α helices. Our observations reveal that tertiary structure and dynamics are not directly correlated and that primary structure plays an important role in dynamics. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
Buried and accessible surface area control intrinsic protein flexibility.
Marsh, Joseph A
2013-09-09
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the relative solvent-accessible surface area (Arel), therefore shows excellent agreement with independent measures of global protein flexibility derived from various experimental and computational methods. Application of Arel on a large scale demonstrates its utility by revealing unique sequence and structural properties associated with intrinsic flexibility. In particular, flexibility as measured by Arel shows little correspondence with intrinsic disorder, but instead tends to be associated with multiple domains and increased α-helical structure. Furthermore, the apparent flexibility of monomeric proteins is found to be useful for identifying quaternary-structure errors in published crystal structures. There is also a strong tendency for the crystal structures of more flexible proteins to be solved to lower resolutions. Finally, local solvent accessibility is shown to be a primary determinant of local residue flexibility. Overall, this work provides both fundamental mechanistic insight into the origin of protein flexibility and a simple, practical method for predicting flexibility from protein structures. © 2013 Elsevier Ltd. All rights reserved.
Structures of the transmembrane helices of the G-protein coupled receptor, rhodopsin.
Katragadda, M; Chopra, A; Bennett, M; Alderfer, J L; Yeagle, P L; Albert, A D
2001-07-01
An hypothesis is tested that individual peptides corresponding to the transmembrane helices of the membrane protein, rhodopsin, would form helices in solution similar to those in the native protein. Peptides containing the sequences of helices 1, 4 and 5 of rhodopsin were synthesized. Two peptides, with overlapping sequences at their termini, were synthesized to cover each of the helices. The peptides from helix 1 and helix 4 were helical throughout most of their length. The N- and C-termini of all the peptides were disordered and proline caused opening of the helical structure in both helix 1 and helix 4. The peptides from helix 5 were helical in the middle segment of each peptide, with larger disordered regions in the N- and C-termini than for helices 1 and 4. These observations show that there is a strong helical propensity in the amino acid sequences corresponding to the transmembrane domain of this G-protein coupled receptor. In the case of the peptides from helix 4, it was possible to superimpose the structures of the overlapping sequences to produce a construct covering the whole of the sequence of helix 4 of rhodopsin. As similar superposition for the peptides from helix 1 also produced a construct, but somewhat less successfully because of the disordering in the region of sequence overlap. This latter problem was more severe for helix 5 and therefore a single peptide was synthesized for the entire sequence of this helix, and its structure determined. It proved to be helical throughout. Comparison of all these structures with the recent crystal structure of rhodopsin revealed that the peptide structures mimicked the structures seen in the whole protein. Thus similar studies of peptides may provide useful information on the secondary structure of other transmembrane proteins built around helical bundles.
Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.
2007-01-01
The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268
Common Amino Acid Subsequences in a Universal Proteome—Relevance for Food Science
Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Sokołowska, Jolanta; Starowicz, Piotr; Bucholska, Justyna; Hrynkiewicz, Monika
2015-01-01
A common subsequence is a fragment of the amino acid chain that occurs in more than one protein. Common subsequences may be an object of interest for food scientists as biologically active peptides, epitopes, and/or protein markers that are used in comparative proteomics. An individual bioactive fragment, in particular the shortest fragment containing two or three amino acid residues, may occur in many protein sequences. An individual linear epitope may also be present in multiple sequences of precursor proteins. Although recent recommendations for prediction of allergenicity and cross-reactivity include not only sequence identity, but also similarities in secondary and tertiary structures surrounding the common fragment, local sequence identity may be used to screen protein sequence databases for potential allergens in silico. The main weakness of the screening process is that it overlooks allergens and cross-reactivity cases without identical fragments corresponding to linear epitopes. A single peptide may also serve as a marker of a group of allergens that belong to the same family and, possibly, reveal cross-reactivity. This review article discusses the benefits for food scientists that follow from the common subsequences concept. PMID:26340620
Mapping genes to human chromosome 19
DOE Office of Scientific and Technical Information (OSTI.GOV)
Connolly, Sarah
1996-05-01
For this project, 22 Expressed Sequence Tags (ESTs) were fine mapped to regions of human chromosome 19. An EST is a short DNA sequence that occurs once in the genome and corresponds to a single expressed gene. {sup 32}P-radiolabeled probes were made by polymerase chain reaction for each EST and hybridized to filters containing a chromosome 19-specific cosmid library. The location of the ESTs on the chromosome was determined by the location of the ordered cosmid to which the EST hybridized. Of the 22 ESTs that were sublocalized, 6 correspond to known genes, and 16 correspond to anonymous genes. Thesemore » localized ESTs may serve as potential candidates for disease genes, as well as markers for future physical mapping.« less
Cytochrome oxidase subunit II gene in mitochondria of Oenothera has no intron
Hiesel, Rudolf; Brennicke, Axel
1983-01-01
The cytochrome oxidase subunit II gene has been localized in the mitochondrial genome of Oenothera berteriana and the nucleotide sequence has been determined. The coding sequence contains 777 bp and, unlike the corresponding gene in Zea mays, is not interrupted by an intron. No TGA codon is found within the open reading frame. The codon CGG, as in the maize gene, is used in place of tryptophan codons of corresponding genes in other organisms. At position 742 in the Oenothera sequence the TGG of maize is changed into a CGG codon, where Trp is conserved as the amino acid in other organisms. Homologous sequences occur more than once in the mitochondrial genome as several mitochondrial DNA species hybridize with DNA probes of the cytochrome oxidase subunit II gene. ImagesFig. 5. PMID:16453484
Specific material recognition by small peptides mediated by the interfacial solvent structure.
Schneider, Julian; Ciacchi, Lucio Colombi
2012-02-01
We present evidence that specific material recognition by small peptides is governed by local solvent density variations at solid/liquid interfaces, sensed by the side-chain residues with atomic-scale precision. In particular, we unveil the origin of the selectivity of the binding motif RKLPDA for Ti over Si using a combination of metadynamics and steered molecular dynamics simulations, obtaining adsorption free energies and adhesion forces in quantitative agreement with corresponding experiments. For an accurate description, we employ realistic models of the natively oxidized surfaces which go beyond the commonly used perfect crystal surfaces. These results have profound implications for nanotechnology and materials science applications, offering a previously missing structure-function relationship for the rational design of materials-selective peptide sequences. © 2011 American Chemical Society
Kwasigroch, Jean Marc; Rooman, Marianne
2006-07-15
Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
Miller, Mark P.; Haig, Susan M.; Wagner, R.S.
2006-01-01
The Southern torrent salamander (Rhyacotriton variegatus) was recently found not warranted for listing under the US Endangered Species Act due to lack of information regarding population fragmentation and gene flow. Found in small-order streams associated with late-successional coniferous forests of the US Pacific Northwest, threats to their persistence include disturbance related to timber harvest activities. We conducted a study of genetic diversity throughout this species' range to 1) identify major phylogenetic lineages and phylogeographic barriers and 2) elucidate regional patterns of population genetic and spatial phylogeographic structure. Cytochrome b sequence variation was examined for 189 individuals from 72 localities. We identified 3 major lineages corresponding to nonoverlapping geographic regions: a northern California clade, a central Oregon clade, and a northern Oregon clade. The Yaquina River may be a phylogeographic barrier between the northern Oregon and central Oregon clades, whereas the Smith River in northern California appears to correspond to the discontinuity between the central Oregon and northern California clades. Spatial analyses of genetic variation within regions encompassing major clades indicated that the extent of genetic structure is comparable among regions. We discuss our results in the context of conservation efforts for Southern torrent salamanders.
Kemppainen, Petri; Knight, Christopher G; Sarma, Devojit K; Hlaing, Thaung; Prakash, Anil; Maung Maung, Yan Naung; Somboon, Pradya; Mahanta, Jagadish; Walton, Catherine
2015-09-01
Recent advances in sequencing allow population-genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction-site-associated DNA sequence (RAD-seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well-characterized single nucleotide polymorphism (SNP) data set from 21 three-spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single-outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population-genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population-demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population-genomic data set, making it especially valuable for nonmodel species. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
A Generative Angular Model of Protein Structure Evolution
Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun
2017-01-01
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Human Autoantibodies Reveal Titin as a Chromosomal Protein
Machado, Cristina; Sunkel, Claudio E.; Andrew, Deborah J.
1998-01-01
Assembly of the higher-order structure of mitotic chromosomes is a prerequisite for proper chromosome condensation, segregation and integrity. Understanding the details of this process has been limited because very few proteins involved in the assembly of chromosome structure have been discovered. Using a human autoimmune scleroderma serum that identifies a chromosomal protein in human cells and Drosophila embryos, we cloned the corresponding Drosophila gene that encodes the homologue of vertebrate titin based on protein size, sequence similarity, developmental expression and subcellular localization. Titin is a giant sarcomeric protein responsible for the elasticity of striated muscle that may also function as a molecular scaffold for myofibrillar assembly. Molecular analysis and immunostaining with antibodies to multiple titin epitopes indicates that the chromosomal and muscle forms of titin may vary in their NH2 termini. The identification of titin as a chromosomal component provides a molecular basis for chromosome structure and elasticity. PMID:9548712
Predictive Multiple Model Switching Control with the Self-Organizing Map
NASA Technical Reports Server (NTRS)
Motter, Mark A.
2000-01-01
A predictive, multiple model control strategy is developed by extension of self-organizing map (SOM) local dynamic modeling of nonlinear autonomous systems to a control framework. Multiple SOMs collectively model the global response of a nonautonomous system to a finite set of representative prototype controls. Each SOM provides a codebook representation of the dynamics corresponding to a prototype control. Different dynamic regimes are organized into topological neighborhoods where the adjacent entries in the codebook represent the global minimization of a similarity metric. The SOM is additionally employed to identify the local dynamical regime, and consequently implements a switching scheme that selects the best available model for the applied control. SOM based linear models are used to predict the response to a larger family of control sequences which are clustered on the representative prototypes. The control sequence which corresponds to the prediction that best satisfies the requirements on the system output is applied as the external driving signal.
DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability
Little, Damon P.
2011-01-01
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897
JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures
Dong, Min; Graham, Mitchell; Yadav, Nehul
2017-01-01
Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416
Dense depth maps from correspondences derived from perceived motion
NASA Astrophysics Data System (ADS)
Kirby, Richard; Whitaker, Ross
2017-01-01
Many computer vision applications require finding corresponding points between images and using the corresponding points to estimate disparity. Today's correspondence finding algorithms primarily use image features or pixel intensities common between image pairs. Some 3-D computer vision applications, however, do not produce the desired results using correspondences derived from image features or pixel intensities. Two examples are the multimodal camera rig and the center region of a coaxial camera rig. We present an image correspondence finding technique that aligns pairs of image sequences using optical flow fields. The optical flow fields provide information about the structure and motion of the scene, which are not available in still images but can be used in image alignment. We apply the technique to a dual focal length stereo camera rig consisting of a visible light-infrared camera pair and to a coaxial camera rig. We test our method on real image sequences and compare our results with the state-of-the-art multimodal and structure from motion (SfM) algorithms. Our method produces more accurate depth and scene velocity reconstruction estimates than the state-of-the-art multimodal and SfM algorithms.
Pieper-Fürst, U.; Madkour, M. H.; Mayer, F.; Steinbüchel, A.
1994-01-01
The N-terminal amino acid sequence of the polyhydroxyalkanoic acid (PHA) granule-associated M(r)-15,500 protein of Rhodococcus ruber (the GA14 protein) was analyzed. The sequence revealed that the corresponding structural gene is represented by open reading frame 3, encoding a protein with a calculated M(r) of 14,175 which was recently localized downstream of the PHA synthase gene (U. Pieper and A. Steinbüchel, FEMS Microbiol. Lett. 96:73-80, 1992). A recombinant strain of Escherichia coli XL1-Blue carrying the hybrid plasmid (pSKXA10*) with open reading frame 3 overexpressed the GA14 protein. The GA14 protein was subsequently purified in a three-step procedure including chromatography on DEAE-Sephacel, phenyl-Sepharose CL-4B, and Superose 12. Determination of the molecular weight by gel filtration as well as electron microscopic studies indicates that a tetrameric structure of the recombinant, native GA14 protein is most likely. Immunoelectron microscopy demonstrated a localization of the GA14 protein at the periphery of PHA granules as well as close to the cell membrane in R. ruber. Investigations of PHA-leaky and PHA-negative mutants of R. ruber indicated that expression of the GA14 protein depended strongly on PHA synthesis. Images PMID:8021220
Sequence diagrams and the presentation of structural and evolutionary relationships among proteins.
Thomas, B R
1975-01-01
Protein sequences mapped on two-dimensional diagrams show characteristic patterns that should be of value in visualising sequence information and in distinguishing simpler structures. A convenient map form for comparative purposes is the alpha-helix diagram with aminoacid distribution analogous to the surface of an alpha-helix oriented so that an alpha-helix structure corresponds on the diagram to a vertical band 3.6 residues wide. The sequence diagram for an alpha-keratin, high-sulphur protein suggests a new form of polypeptide helix based on a repeating unit of five which may be an important component of alpha-keratin fibres.
2014-01-01
Background Plasmodium vivax is a protozoan parasite with an extensive worldwide distribution, being highly prevalent in Asia as well as in Mesoamerica and South America. In southern Mexico, P. vivax transmission has been endemic and recent studies suggest that these parasites have unique biological and genetic features. The msp1 gene has shown high rate of nucleotide substitutions, deletions, insertions, and its mosaic structure reveals frequent events of recombination, maybe between highly divergent parasite isolates. Methods The nucleotide sequence variation in the polymorphic icb5-6 fragment of the msp1 gene of Mexican and worldwide isolates was analysed. To understand how genotype diversity arises, disperses and persists in Mexico, the genetic structure and genealogical relationships of local isolates were examined. To identify new sequence hybrids and their evolutionary relationships with other P. vivax isolates circulating worldwide two haplotype networks were constructed questioning that two portions of the icb5-6 have different evolutionary history. Results Twelve new msp1 icb5-6 haplotypes of P. vivax from Mexico were identified. These nucleotide sequences show mosaic structure comprising three partially conserved and two variable subfragments and resulted into five different sequence types. The variable subfragment sV1 has undergone recombination events and resulted in hybrid sequences and the haplotype network allocated the Mexican haplotypes to three lineages, corresponding to the Sal I and Belem types, and other more divergent group. In contrast, the network from icb5-6 fragment but not sV1 revealed that the Mexican haplotypes belong to two separate lineages, none of which are closely related to Sal I or Belem sequences. Conclusions These results suggest that the new hybrid haplotypes from southern Mexico were the result of at least three different recombination events. These rearrangements likely resulted from the recombination between haplotypes of highly divergent lineages that are frequently distributed in South America and Asia and diversified rapidly. PMID:24472213
NASA Astrophysics Data System (ADS)
Ding, Jun
Metallic glasses (MGs), discovered five decades ago as a newcomer in the family of glasses, are of current interest because of their unique structures and properties. There are also many fundamental materials science issues that remain unresolved for metallic glasses, as well as their predecessor above glass transition temperature, the supercooled liquids. In particular, it is a major challenge to characterize the local structure and unveil the structure-property relationship for these amorphous materials. This thesis presents a systematic study of the local structure of metallic glasses as well as supercooled liquids via classical and ab initio molecular dynamics simulations. Three typical MG models are chosen as representative candidate, Cu64 Zr36, Pd82Si18 and Mg65Cu 25Y10 systems, while the former is dominant with full icosahedra short-range order and the prism-type short-range order dominate for latter two. Furthermore, we move to unravel the underlying structural signature among several properties in metallic glasses. Firstly, the temperature dependence of specific heat and liquid fragility between Cu-Zr and Mg-Cu-Y (also Pd-Si) in supercooled liquids are quite distinct: gradual versus fast evolution of specific heat and viscosity/relaxation time with undercooling. Their local structural ordering are found to relate with the temperature dependence of specific heat and relaxation time. Then elastic heterogeneity has been studied to correlate with local structure in Cu-Zr MGs. Specifically, this part covers how the degree of elastic deformation correlates with the internal structure at the atomic level, how to quantitatively evaluate the local solidity/liquidity in MGs and how the network of interpenetrating connection of icosahedra determine the corresponding shear modulus. Finally, we have illustrated the structure signature of quasi-localized low-frequency vibrational normal modes, which resides the intriguing vibrational properties in MGs. Specifically, the local atomic packing structure in a model MG strongly correlate with the corresponding participation fraction in quasi-localized soft modes, while the highest and lowest participation correspond to geometrically unfavored motifs and ISRO respectively. In addition, we clearly demonstrate that quasi-localized low-frequency vibrational modes correlate strongly with fertile sites for shear transformations in a MG.
Bouchard, Kristofer E.; Ganguli, Surya; Brainard, Michael S.
2015-01-01
The majority of distinct sensory and motor events occur as temporally ordered sequences with rich probabilistic structure. Sequences can be characterized by the probability of transitioning from the current state to upcoming states (forward probability), as well as the probability of having transitioned to the current state from previous states (backward probability). Despite the prevalence of probabilistic sequencing of both sensory and motor events, the Hebbian mechanisms that mold synapses to reflect the statistics of experienced probabilistic sequences are not well understood. Here, we show through analytic calculations and numerical simulations that Hebbian plasticity (correlation, covariance, and STDP) with pre-synaptic competition can develop synaptic weights equal to the conditional forward transition probabilities present in the input sequence. In contrast, post-synaptic competition can develop synaptic weights proportional to the conditional backward probabilities of the same input sequence. We demonstrate that to stably reflect the conditional probability of a neuron's inputs and outputs, local Hebbian plasticity requires balance between competitive learning forces that promote synaptic differentiation and homogenizing learning forces that promote synaptic stabilization. The balance between these forces dictates a prior over the distribution of learned synaptic weights, strongly influencing both the rate at which structure emerges and the entropy of the final distribution of synaptic weights. Together, these results demonstrate a simple correspondence between the biophysical organization of neurons, the site of synaptic competition, and the temporal flow of information encoded in synaptic weights by Hebbian plasticity while highlighting the utility of balancing learning forces to accurately encode probability distributions, and prior expectations over such probability distributions. PMID:26257637
Using structure to explore the sequence alignment space of remote homologs.
Kuziemko, Andrew; Honig, Barry; Petrey, Donald
2011-10-01
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong
2014-10-30
Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org. Copyright © 2014 Wiley Periodicals, Inc.
Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.
Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F
1984-01-01
The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019
Protein structure recognition: From eigenvector analysis to structural threading method
NASA Astrophysics Data System (ADS)
Cao, Haibo
In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.
Single-Molecule Denaturation Mapping of Genomic DNA in Nanofluidic Channels
NASA Astrophysics Data System (ADS)
Reisner, Walter; Larsen, Niels; Kristensen, Anders; Tegenfeldt, Jonas O.; Flyvbjerg, Henrik
2009-03-01
We have developed a new DNA barcoding technique based on the partial denaturation of extended fluorescently labeled DNA molecules. We partially melt DNA extended in nanofluidic channels via a combination of local heating and added chemical denaturants. The melted molecules, imaged via a standard fluorescence videomicroscopy setup, exhibit a nonuniform fluorescence profile corresponding to a series of local dips and peaks in the intensity trace along the stretched molecule. We show that this barcode is consistent with the presence of locally melted regions and can be explained by calculations of sequence-dependent melting probability. We believe this melting mapping technology is the first optically based single molecule technique sensitive to genome wide sequence variation that does not require an additional enzymatic labeling or restriction scheme.
Mathupala, S P; Lowe, S E; Podkovyrov, S M; Zeikus, J G
1993-08-05
The complete nucleotide sequence of the gene encoding the dual active amylopullulanase of Thermoanaerobacter ethanolicus 39E (formerly Clostridium thermohydrosulfuricum) was determined. The structural gene (apu) contained a single open reading frame 4443 base pairs in length, corresponding to 1481 amino acids, with an estimated molecular weight of 162,780. Analysis of the deduced sequence of apu with sequences of alpha-amylases and alpha-1,6 debranching enzymes enabled the identification of four conserved regions putatively involved in substrate binding and in catalysis. The conserved regions were localized within a 2.9-kilobase pair gene fragment, which encoded a M(r) 100,000 protein that maintained the dual activities and thermostability of the native enzyme. The catalytic residues of amylopullulanase were tentatively identified by using hydrophobic cluster analysis for comparison of amino acid sequences of amylopullulanase and other amylolytic enzymes. Asp597, Glu626, and Asp703 were individually modified to their respective amide form, or the alternate acid form, and in all cases both alpha-amylase and pullulanase activities were lost, suggesting the possible involvement of 3 residues in a catalytic triad, and the presence of a putative single catalytic site within the enzyme. These findings substantiate amylopullulanase as a new type of amylosaccharidase.
Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas
2006-03-09
The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L
1988-01-01
Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, R.D.; Fieles, W.E.; Schotland, D.L.
1987-01-01
A peptide corresponding to amino acid residues 1783-1794 near the C terminus of the electric eel sodium channel primary sequence of the eel (Electrophorus electricus) sodium channel has been synthesized and used to raise an antiserum in rabbits. This antiserum specifically recognized the peptide in a solid-phase radioimmunoassay. Specificity of the antiserum for the native channel protein was shown by its specific binding to a 280-kDa protein in immunoblots of eel electroplax membrane proteins. The antiserum also specifically labeled the innervated membrane of the eel electroplax in immunofluorescent studies. The membrane topology of the peptide recognized by this antiserum wasmore » proved in binding studies using oriented electroplax membrane vesicles. These vesicles were 98% right-side-out as determined by (/sup 3/H)saxitoxin binding. Binding of the antipeptide antiserum to this fraction was measured before and after permeabilization with 0.01% saponin. Specific binding to intact vesicles was low, but this binding increased 10-fold after permeabilization, implying a cytoplasmic orientation for the peptide. Confirmation for this orientation was then sought by localizing the antibody bound to intact electroplax cells with immunogold electron microscopy. The data imply that the region of the sodium channel primary sequence near the C terminus that is recognized by the anitserum is localized on the cytoplasmic side of the membrane; this localization provides some further constraints on models of sodium channel tertiary structure.« less
Local contrast-enhanced MR images via high dynamic range processing.
Chandra, Shekhar S; Engstrom, Craig; Fripp, Jurgen; Neubert, Ales; Jin, Jin; Walker, Duncan; Salvado, Olivier; Ho, Charles; Crozier, Stuart
2018-09-01
To develop a local contrast-enhancing and feature-preserving high dynamic range (HDR) image processing algorithm for multichannel and multisequence MR images of multiple body regions and tissues, and to evaluate its performance for structure visualization, bias field (correction) mitigation, and automated tissue segmentation. A multiscale-shape and detail-enhancement HDR-MRI algorithm is applied to data sets of multichannel and multisequence MR images of the brain, knee, breast, and hip. In multisequence 3T hip images, agreement between automatic cartilage segmentations and corresponding synthesized HDR-MRI series were computed for mean voxel overlap established from manual segmentations for a series of cases. Qualitative comparisons between the developed HDR-MRI and standard synthesis methods were performed on multichannel 7T brain and knee data, and multisequence 3T breast and knee data. The synthesized HDR-MRI series provided excellent enhancement of fine-scale structure from multiple scales and contrasts, while substantially reducing bias field effects in 7T brain gradient echo, T 1 and T 2 breast images and 7T knee multichannel images. Evaluation of the HDR-MRI approach on 3T hip multisequence images showed superior outcomes for automatic cartilage segmentations with respect to manual segmentation, particularly around regions with hyperintense synovial fluid, across a set of 3D sequences. The successful combination of multichannel/sequence MR images into a single-fused HDR-MR image format provided consolidated visualization of tissues within 1 omnibus image, enhanced definition of thin, complex anatomical structures in the presence of variable or hyperintense signals, and improved tissue (cartilage) segmentation outcomes. © 2018 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Wertgeim, Igor I.
2018-02-01
We investigate stationary and non-stationary solutions of nonlinear equations of the long-wave approximation for the Marangoni convection caused by a localized source of heat or a surface active impurity (surfactant) in a thin horizontal layer of a viscous incompressible fluid with a free surface. The distribution of heat or concentration flux is determined by the uniform vertical gradient of temperature or impurity concentration, distorted by the imposition of a slightly inhomogeneous heating or of surfactant, localized in the horizontal plane. The lower boundary of the layer is considered thermally insulated or impermeable, whereas the upper boundary is free and deformable. The equations obtained in the long-wave approximation are formulated in terms of the amplitudes of the temperature distribution or impurity concentration, deformation of the surface, and vorticity. For a simplification of the problem, a sequence of nonlinear equations is obtained, which in the simplest form leads to a nonlinear Schrödinger equation with a localized potential. The basic state of the system, its dependence on the parameters and stability are investigated. For stationary solutions localized in the region of the surface tension inhomogeneity, domains of parameters corresponding to different spatial patterns are delineated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Randall, Graham L.; Zechiedrich, E. L.; Pettitt, Bernard M.
2009-09-01
To understand how underwinding and overwinding the DNA helix affects its structure, we simulated 19 independent DNA systems with fixed degrees of twist using molecular dynamics in a system that does not allow writhe. Underwinding DNA induced spontaneous, sequence-dependent base flipping and local denaturation, while overwinding DNA induced the formation of Pauling-like DNA (P-DNA). The winding resulted in a bimodal state simultaneously including local structural failure and B-form DNA for both underwinding and extreme overwinding. Our simulations suggest that base flipping and local denaturation may provide a landscape influencing protein recognition of DNA sequence to affect, for examples, replication, transcriptionmore » and recombination. Additionally, our findings help explain results from singlemolecule experiments and demonstrate that elastic rod models are strictly valid on average only for unstressed or overwound DNA up to P-DNA formation. Finally, our data support a model in which base flipping can result from torsional stress.« less
Instantaneous relationship between solar inertial and local vertical local horizontal attitudes
NASA Technical Reports Server (NTRS)
Vickery, S. A.
1977-01-01
The instantaneous relationship between the Solar Inertial (SI) and Local Vertical Local Horizontal (LVLH) coordinate systems is derived. A method is presented for computation of the LVLH to SI rotational transformation matrix as a function of an input LVLH attitude and the corresponding look angles to the sun. Logic is provided for conversion between LVLH and SI attitudes expressed in terms of a pitch, yaw, roll Euler sequence. Documentation is included for a program which implements the logic on the Hewlett-Packard 97 programmable calculator.
Rotondi, Kenneth S; Gierasch, Lila M
2003-01-01
We have recently shown that two of the beta-turns (III and IV) in the ten-stranded, beta-clam protein, cellular retinoic acid-binding protein I (CRABP I), are favored in short peptide fragments, arguing that they are encoded by local interactions (K. S. Rotondi and L. M. Gierasch, Biochemistry, 2003, Vol. 42, pp. 7976-7985). In this paper we examine these turns in greater detail to dissect the specific local interactions responsible for their observed native conformational biases. Conformations of peptides corresponding to the turn III and IV fragments were examined under conditions designed to selectively disrupt stabilizing interactions, using pH variation, chaotrope addition, or mutagenesis to probe specific side-chain influences. We find that steric constraints imposed by excluded volume effects between near neighbor residues (i,i+2), favorable polar (i,i+2) interactions, and steric permissiveness of glycines are the principal factors accounting for the observed native bias in these turns. Longer-range stabilizing interactions across the beta-turns do not appear to play a significant role in turn stability in these short peptides, in contrast to their importance in hairpins. Additionally, our data add to a growing number of examples of the 3:5 type I turn with a beta-bulge as a class of turns with high propensity to form locally defined structure. Current work is directed at the interplay between the local sequence information in the turns and more long-range influences in the mechanism of folding of this predominantly beta-sheet protein. Copyright 2004 Wiley Periodicals, Inc.
Evolutionary optimization of biopolymers and sequence structure maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reidys, C.M.; Kopp, S.; Schuster, P.
1996-06-01
Searching for biopolymers having a predefined function is a core problem of biotechnology, biochemistry and pharmacy. On the level of RNA sequences and their corresponding secondary structures we show that this problem can be analyzed mathematically. The strategy will be to study the properties of the RNA sequence to secondary structure mapping that is essential for the understanding of the search process. We show that to each secondary structure s there exists a neutral network consisting of all sequences folding into s. This network can be modeled as a random graph and has the following generic properties: it is densemore » and has a giant component within the graph of compatible sequences. The neutral network percolates sequence space and any two neutral nets come close in terms of Hamming distance. We investigate the distribution of the orders of neutral nets and show that above a certain threshold the topology of neutral nets allows to find practically all frequent secondary structures.« less
Miller, M.P.; Haig, S.M.; Wagner, R.S.
2006-01-01
The Southern torrent salamander (Rhyacotriton variegatus) was recently found not warranted for listing under the US Endangered Species Act due to lack of information regarding population fragmentation and gene flow. Found in small-order streams associated with late-successional coniferous forests of the US Pacific Northwest, threats to their persistence include disturbance related to timber harvest activities. We conducted a study of genetic diversity throughout this species' range to 1) identify major phylogenetic lineages and phylogeographic barriers and 2) elucidate regional patterns of population genetic and spatial phylogeographic structure. Cytochrome b sequence variation was examined for 189 individuals from 72 localities. We identified 3 major lineages corresponding to nonoverlapping geographic regions: a northern California clade, a central Oregon clade, and a northern Oregon clade. The Yaquina River may be a phylogeographic barrier between the northern Oregon and central Oregon clades, whereas the Smith River in northern California appears to correspond to the discontinuity between the central Oregon and northern California clades. Spatial analyses of genetic variation within regions encompassing major clades indicated that the extent of genetic structure is comparable among regions. We discuss our results in the context of conservation efforts for Southern torrent salamanders. ?? The American Genetic Association. 2006. All rights reserved.
NASA Astrophysics Data System (ADS)
Meyer, Sam; Everaers, Ralf
2015-02-01
The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-01-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-07-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin
2015-04-01
This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.
Ornelas, Juan Francisco; Gándara, Etelvina; Vásquez-Aguilar, Antonio Acini; Ramírez-Barahona, Santiago; Ortiz-Rodriguez, Andrés Ernesto; González, Clementina; Mejía Saules, María Teresa; Ruiz-Sanchez, Eduardo
2016-04-12
Ecological adaptation to host taxa is thought to result in mistletoe speciation via race formation. However, historical and ecological factors could also contribute to explain genetic structuring particularly when mistletoe host races are distributed allopatrically. Using sequence data from nuclear (ITS) and chloroplast (trnL-F) DNA, we investigate the genetic differentiation of 31 Psittacanthus schiedeanus (Loranthaceae) populations across the Mesoamerican species range. We conducted phylogenetic, population and spatial genetic analyses on 274 individuals of P. schiedeanus to gain insight of the evolutionary history of these populations. Species distribution modeling, isolation with migration and Bayesian inference methods were used to infer the evolutionary transition of mistletoe invasion, in which evolutionary scenarios were compared through posterior probabilities. Our analyses revealed shallow levels of population structure with three genetic groups present across the sample area. Nine haplotypes were identified after sequencing the trnL-F intergenic spacer. These haplotypes showed phylogeographic structure, with three groups with restricted gene flow corresponding to the distribution of individuals/populations separated by habitat (cloud forest localities from San Luis Potosí to northwestern Oaxaca and Chiapas, localities with xeric vegetation in central Oaxaca, and localities with tropical deciduous forests in Chiapas), with post-glacial population expansions and potentially corresponding to post-glacial invasion types. Similarly, 44 ITS ribotypes suggest phylogeographic structure, despite the fact that most frequent ribotypes are widespread indicating effective nuclear gene flow via pollen. Gene flow estimates, a significant genetic signal of demographic expansion, and range shifts under past climatic conditions predicted by species distribution modeling suggest post-glacial invasion of P. schiedeanus mistletoes to cloud forests. However, Approximate Bayesian Computation (ABC) analyses strongly supported a scenario of simultaneous divergence among the three groups isolated recently. Our results provide support for the predominant role of isolation and environmental factors in driving genetic differentiation of Mesoamerican parrot-flower mistletoes. The ABC results are consistent with a scenario of post-glacial mistletoe invasion, independent of host identity, and that habitat types recently isolated P. schiedeanus populations, accumulating slight phenotypic differences among genetic groups due to recent migration across habitats. Under this scenario, climatic fluctuations throughout the Pleistocene would have altered the distribution of suitable habitat for mistletoes throughout Mesoamerica leading to variation in population continuity and isolation. Our findings add to an understanding of the role of recent isolation and colonization in shaping cloud forest communities in the region.
Bedbrook, Claire N; Yang, Kevin K; Rice, Austin J; Gradinaru, Viviana; Arnold, Frances H
2017-10-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.
Rice, Austin J.; Gradinaru, Viviana; Arnold, Frances H.
2017-01-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well. PMID:29059183
Organizational heterogeneity of vertebrate genomes.
Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham
2012-01-01
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Cocco, Simona; Monasson, Remi; Weigt, Martin
2013-01-01
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764
Serotype-specific differences in dengue virus non-structural protein 5 nuclear localization.
Hannemann, Holger; Sung, Po-Yu; Chiu, Han-Chen; Yousuf, Amjad; Bird, Jim; Lim, Siew Pheng; Davidson, Andrew D
2013-08-02
The four serotypes of dengue virus (DENV-1 to -4) cause the most important arthropod-borne viral disease of humans. DENV non-structural protein 5 (NS5) contains enzymatic activities required for capping and replication of the viral RNA genome that occurs in the host cytoplasm. However, previous studies have shown that DENV-2 NS5 accumulates in the nucleus during infection. In this study, we examined the nuclear localization of NS5 for all four DENV serotypes. We demonstrate for the first time that there are serotypic differences in NS5 nuclear localization. Whereas the DENV-2 and -3 proteins accumulate in the nucleus, DENV-1 and -4 NS5 are predominantly if not exclusively localized to the cytoplasm. Comparative studies on the DENV-2 and -4 NS5 proteins revealed that the difference in DENV-4 NS5 nuclear localization was not due to rapid nuclear export but rather the lack of a functional nuclear localization sequence. Interaction studies using DENV-2 and -4 NS5 and human importin-α isoforms failed to identify an interaction that supported the differential nuclear localization of NS5. siRNA knockdown of the human importin-α isoform KPNA2, corresponding to the murine importin-α isoform previously shown to bind to DENV-2 NS5, did not substantially affect DENV-2 NS5 nuclear localization, whereas knockdown of importin-β did. The serotypic differences in NS5 nuclear localization did not correlate with differences in IL-8 gene expression. The results show that NS5 nuclear localization is not strictly required for virus replication but is more likely to have an auxiliary function in the life cycle of specific DENV serotypes.
Serotype-specific Differences in Dengue Virus Non-structural Protein 5 Nuclear Localization*
Hannemann, Holger; Sung, Po-Yu; Chiu, Han-Chen; Yousuf, Amjad; Bird, Jim; Lim, Siew Pheng; Davidson, Andrew D.
2013-01-01
The four serotypes of dengue virus (DENV-1 to -4) cause the most important arthropod-borne viral disease of humans. DENV non-structural protein 5 (NS5) contains enzymatic activities required for capping and replication of the viral RNA genome that occurs in the host cytoplasm. However, previous studies have shown that DENV-2 NS5 accumulates in the nucleus during infection. In this study, we examined the nuclear localization of NS5 for all four DENV serotypes. We demonstrate for the first time that there are serotypic differences in NS5 nuclear localization. Whereas the DENV-2 and -3 proteins accumulate in the nucleus, DENV-1 and -4 NS5 are predominantly if not exclusively localized to the cytoplasm. Comparative studies on the DENV-2 and -4 NS5 proteins revealed that the difference in DENV-4 NS5 nuclear localization was not due to rapid nuclear export but rather the lack of a functional nuclear localization sequence. Interaction studies using DENV-2 and -4 NS5 and human importin-α isoforms failed to identify an interaction that supported the differential nuclear localization of NS5. siRNA knockdown of the human importin-α isoform KPNA2, corresponding to the murine importin-α isoform previously shown to bind to DENV-2 NS5, did not substantially affect DENV-2 NS5 nuclear localization, whereas knockdown of importin-β did. The serotypic differences in NS5 nuclear localization did not correlate with differences in IL-8 gene expression. The results show that NS5 nuclear localization is not strictly required for virus replication but is more likely to have an auxiliary function in the life cycle of specific DENV serotypes. PMID:23770669
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Roles of beta-turns in protein folding: from peptide models to protein engineering.
Marcelino, Anna Marie C; Gierasch, Lila M
2008-05-01
Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein's structure.
Roles of β-Turns in Protein Folding: From Peptide Models to Protein Engineering
Marcelino, Anna Marie C.; Gierasch, Lila M.
2010-01-01
Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein’s structure. PMID:18275088
Stegemann, Björn; Klebe, Gerhard
2012-02-01
Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process. Copyright © 2011 Wiley Periodicals, Inc.
Regularization in Short-Term Memory for Serial Order
ERIC Educational Resources Information Center
Botvinick, Matthew; Bylsma, Lauren M.
2005-01-01
Previous research has shown that short-term memory for serial order can be influenced by background knowledge concerning regularities of sequential structure. Specifically, it has been shown that recall is superior for sequences that fit well with familiar sequencing constraints. The authors report a corresponding effect pertaining to serial…
Elman RNN based classification of proteins sequences on account of their mutual information.
Mishra, Pooja; Nath Pandey, Paras
2012-10-21
In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.
Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min
2011-01-01
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507
Spontaneous PT symmetry breaking in Dirac-Kronig-Penney crystals
NASA Astrophysics Data System (ADS)
Longhi, Stefano; Cannata, Francesco; Ventura, Alberto
2011-12-01
We introduce a non-Hermitian PT invariant extension of the Dirac-Kronig-Penney model, describing the motion of a Dirac quasiparticle in a locally periodic sequence of imaginary δ-Dirac barriers and wells, and propose its optical realization using superstructure fiber Bragg gratings with alternating regions of optical gain and absorption. For the infinite crystal, we determine the band structure and show that the PT phase is always broken. For a finite crystal, we derive analytical expressions for reflection and transmission probabilities, and show that the PT phase is unbroken below a finite threshold of the δ-barrier area. In the proposed optical realization, the onset of PT symmetry breaking in the finite crystal corresponds to the lasing condition for the grating superstructures.
WebLogo: A Sequence Logo Generator
Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.
2004-01-01
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120
Bhattacharyya, Dhananjay; Halder, Sukanya; Basu, Sankar; Mukherjee, Debasish; Kumar, Prasun; Bansal, Manju
2017-02-01
Comprehensive analyses of structural features of non-canonical base pairs within a nucleic acid double helix are limited by the availability of a small number of three dimensional structures. Therefore, a procedure for model building of double helices containing any given nucleotide sequence and base pairing information, either canonical or non-canonical, is seriously needed. Here we describe a program RNAHelix, which is an updated version of our widely used software, NUCGEN. The program can regenerate duplexes using the dinucleotide step and base pair orientation parameters for a given double helical DNA or RNA sequence with defined Watson-Crick or non-Watson-Crick base pairs. The original structure and the corresponding regenerated structure of double helices were found to be very close, as indicated by the small RMSD values between positions of the corresponding atoms. Structures of several usual and unusual double helices have been regenerated and compared with their original structures in terms of base pair RMSD, torsion angles and electrostatic potentials and very high agreements have been noted. RNAHelix can also be used to generate a structure with a sequence completely different from an experimentally determined one or to introduce single to multiple mutation, but with the same set of parameters and hence can also be an important tool in homology modeling and study of mutation induced structural changes.
Computing the Partition Function for Kinetically Trapped RNA Secondary Structures
Lorenz, William A.; Clote, Peter
2011-01-01
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/. PMID:21297972
3'-terminal sequence of a small round structured virus (SRSV) in Japan.
Utagawa, E T; Takeda, N; Inouye, S; Kasuga, K; Yamazaki, S
1994-01-01
We determined the nucleotide sequence of about 1,000 bases from the 3'-terminus of a small round structured virus (SRSV), which caused a gastroenteritis outbreak in Chiba Prefecture, Japan, in 1987. The sequence was compared with the corresponding sequence region of Norwalk virus; it consisted of a part of the open reading frame 2 (ORF2), whole ORF3, and 3'-noncoding region (NCR). The 624-base-long ORF3 had sequence homology of 68% with the corresponding region of Norwalk virus. (The amino acid sequence homology was 74%.) The 94-base-long NCR had 65% homology with Norwalk virus. We then selected two consensus-sequence portions in the above sequence between Chiba and Norwalk viruses for primers in the reverse transcriptase-polymerase chain reaction (RT-PCR). Using this primer set, we detected 669-bp bands in agarose gel electrophoresis of RT-PCR products from feces containing Chiba or Norwalk viruses. Furthermore, in Southern hybridization with Chiba probes which were labeled with digoxigenin-dUTP in PCR, the bands of the two viruses were clearly stained under a low stringency condition. Since both Chiba and Norwalk viruses were detected by the above primer set although they are geographically and chronologically different viruses, our primer-pair may be useful for detection of a broad range of SRSVs which cause gastroenteritis in different areas.
Spatial localization in heterogeneous systems
NASA Astrophysics Data System (ADS)
Kao, Hsien-Ching; Beaume, Cédric; Knobloch, Edgar
2014-01-01
We study spatial localization in the generalized Swift-Hohenberg equation with either quadratic-cubic or cubic-quintic nonlinearity subject to spatially heterogeneous forcing. Different types of forcing (sinusoidal or Gaussian) with different spatial scales are considered and the corresponding localized snaking structures are computed. The results indicate that spatial heterogeneity exerts a significant influence on the location of spatially localized structures in both parameter space and physical space, and on their stability properties. The results are expected to assist in the interpretation of experiments on localized structures where departures from spatial homogeneity are generally unavoidable.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3
Dietmann, Sabine; Park, Jong; Notredame, Cedric; Heger, Andreas; Lappe, Michael; Holm, Liisa
2001-01-01
The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families. PMID:11125048
Schnare, Murray N.; Collings, James C.; Spencer, David F.; Gray, Michael W.
2000-01-01
In Crithidia fasciculata, the ribosomal RNA (rRNA) gene repeats range in size from ∼11 to 12 kb. This length heterogeneity is localized to a region of the intergenic spacer (IGS) that contains tandemly repeated copies of a 19mer sequence. The IGS also contains four copies of an ∼55 nt repeat that has an internal inverted repeat and is also present in the IGS of Leishmania species. We have mapped the C.fasciculata transcription initiation site as well as two other reverse transcriptase stop sites that may be analogous to the A0 and A′ pre-rRNA processing sites within the 5′ external transcribed spacer (ETS) of other eukaryotes. Features that could influence processing at these sites include two stretches of conserved primary sequence and three secondary structure elements present in the 5′ ETS. We also characterized the C.fasciculata U3 snoRNA, which has the potential for base-pairing with pre-rRNA sequences. Finally, we demonstrate that biosynthesis of large subunit rRNA in both C.fasciculata and Trypanosoma brucei involves 3′-terminal addition of three A residues that are not present in the corresponding DNA sequences. PMID:10982863
2009-01-01
Background Polymerase chain reaction (PCR) is very useful in many areas of molecular biology research. It is commonly observed that PCR success is critically dependent on design of an effective primer pair. Current tools for primer design do not adequately address the problem of PCR failure due to mis-priming on target-related sequences and structural variations in the genome. Methods We have developed an integrated graphical web-based application for primer design, called RExPrimer, which was written in Python language. The software uses Primer3 as the primer designing core algorithm. Locally stored sequence information and genomic variant information were hosted on MySQLv5.0 and were incorporated into RExPrimer. Results RExPrimer provides many functionalities for improved PCR primer design. Several databases, namely annotated human SNP databases, insertion/deletion (indel) polymorphisms database, pseudogene database, and structural genomic variation databases were integrated into RExPrimer, enabling an effective without-leaving-the-website validation of the resulting primers. By incorporating these databases, the primers reported by RExPrimer avoid mis-priming to related sequences (e.g. pseudogene, segmental duplication) as well as possible PCR failure because of structural polymorphisms (SNP, indel, and copy number variation (CNV)). To prevent mismatching caused by unexpected SNPs in the designed primers, in particular the 3' end (SNP-in-Primer), several SNP databases covering the broad range of population-specific SNP information are utilized to report SNPs present in the primer sequences. Population-specific SNP information also helps customize primer design for a specific population. Furthermore, RExPrimer offers a graphical user-friendly interface through the use of scalable vector graphic image that intuitively presents resulting primers along with the corresponding gene structure. In this study, we demonstrated the program effectiveness in successfully generating primers for strong homologous sequences. Conclusion The improvements for primer design incorporated into RExPrimer were demonstrated to be effective in designing primers for challenging PCR experiments. Integration of SNP and structural variation databases allows for robust primer design for a variety of PCR applications, irrespective of the sequence complexity in the region of interest. This software is freely available at http://www4a.biotec.or.th/rexprimer. PMID:19958502
Yeast One-Hybrid Gγ Recruitment System for Identification of Protein Lipidation Motifs
Fukuda, Nobuo; Doi, Motomichi; Honda, Shinya
2013-01-01
Fatty acids and isoprenoids can be covalently attached to a variety of proteins. These lipid modifications regulate protein structure, localization and function. Here, we describe a yeast one-hybrid approach based on the Gγ recruitment system that is useful for identifying sequence motifs those influence lipid modification to recruit proteins to the plasma membrane. Our approach facilitates the isolation of yeast cells expressing lipid-modified proteins via a simple and easy growth selection assay utilizing G-protein signaling that induces diploid formation. In the current study, we selected the N-terminal sequence of Gα subunits as a model case to investigate dual lipid modification, i.e., myristoylation and palmitoylation, a modification that is widely conserved from yeast to higher eukaryotes. Our results suggest that both lipid modifications are required for restoration of G-protein signaling. Although we could not differentiate between myristoylation and palmitoylation, N-terminal position 7 and 8 play some critical role. Moreover, we tested the preference for specific amino-acid residues at position 7 and 8 using library-based screening. This new approach will be useful to explore protein-lipid associations and to determine the corresponding sequence motifs. PMID:23922919
Dancing sprites: Detailed analysis of two case studies
NASA Astrophysics Data System (ADS)
Soula, Serge; Mlynarczyk, Janusz; Füllekrug, Martin; Pineda, Nicolau; Georgis, Jean-François; van der Velde, Oscar; Montanyà, Joan; Fabró, Ferran
2017-03-01
On 29-30 October 2013, a low-light video camera installed at Pic du Midi (2877 m), recorded transient luminous events above a very active storm over the Mediterranean Sea. The minimum cloud top temperature reached -73°C, while its cloud to ground (CG) flash rate exceeded 30 fl min-1. Some sprite events have long duration and resemble to dancing sprites. We analyze in detail the temporal evolution and estimated location of two series of sprite sequences, as well as the cloud structure, the lightning activity, the electric field radiated in a broad range of low frequencies, and the current moment waveform of the lightning strokes. (i) In each series, successive sprite sequences reflect time and location of corresponding positive lightning strokes across the stratiform region. (ii) The longer time-delayed (>20 ms) sprite elements correspond to the lower impulsive charge moment changes (iCMC) of the parent strokes (<200 C km), and they are shifted few tens of kilometers from their SP + CG stroke. However, both short and long time-delayed sprite elements also occur after strokes that produce a large iCMC and that are followed by a continuing current. (iii) The long time-delayed sprite elements during the continuing current correspond to surges in the current moment waveform. They occur sometimes at an altitude apparently lower than the previous short time-delayed sprite elements, possibly because of changes in the local conductivity. (iv) The largest and brightest sprite elements produce significant current signatures, visible when their delay is not too short ( 3-5 ms).
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.
Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G
2007-11-01
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Camproux, A C; Tufféry, P
2005-08-05
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Camproux, A C; Tuffery, P; Chevrolat, J P; Boisvieux, J F; Hazout, S
1999-12-01
The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
Approximate Bayesian Computation by Subset Simulation using hierarchical state-space models
NASA Astrophysics Data System (ADS)
Vakilzadeh, Majid K.; Huang, Yong; Beck, James L.; Abrahamsson, Thomas
2017-02-01
A new multi-level Markov Chain Monte Carlo algorithm for Approximate Bayesian Computation, ABC-SubSim, has recently appeared that exploits the Subset Simulation method for efficient rare-event simulation. ABC-SubSim adaptively creates a nested decreasing sequence of data-approximating regions in the output space that correspond to increasingly closer approximations of the observed output vector in this output space. At each level, multiple samples of the model parameter vector are generated by a component-wise Metropolis algorithm so that the predicted output corresponding to each parameter value falls in the current data-approximating region. Theoretically, if continued to the limit, the sequence of data-approximating regions would converge on to the observed output vector and the approximate posterior distributions, which are conditional on the data-approximation region, would become exact, but this is not practically feasible. In this paper we study the performance of the ABC-SubSim algorithm for Bayesian updating of the parameters of dynamical systems using a general hierarchical state-space model. We note that the ABC methodology gives an approximate posterior distribution that actually corresponds to an exact posterior where a uniformly distributed combined measurement and modeling error is added. We also note that ABC algorithms have a problem with learning the uncertain error variances in a stochastic state-space model and so we treat them as nuisance parameters and analytically integrate them out of the posterior distribution. In addition, the statistical efficiency of the original ABC-SubSim algorithm is improved by developing a novel strategy to regulate the proposal variance for the component-wise Metropolis algorithm at each level. We demonstrate that Self-regulated ABC-SubSim is well suited for Bayesian system identification by first applying it successfully to model updating of a two degree-of-freedom linear structure for three cases: globally, locally and un-identifiable model classes, and then to model updating of a two degree-of-freedom nonlinear structure with Duffing nonlinearities in its interstory force-deflection relationship.
NASA Astrophysics Data System (ADS)
Shih, M. H.; Huang, B. S.
2016-12-01
March 4, 2008, a moderate earthquake (ML 5.2) occurred in Taoyuan district of Kaohsiung County in the southern Taiwan. It was followed by numerous aftershocks in the following 48 hours, including three events with magnitude larger than 4. The Taoyuan earthquake sequence occurred during the TAIGER (Taiwan Integrated Geodynamic Research) project which is to image lithospheric structure of Taiwan orogeny. The high-resolution waveform data of this sequence were well-recorded by a large number of recording stations belong to several different permanent and TAIGER networks all around Taiwan. We had collected the waveform data and archived to a mega database. Then, we had identified 2,340 events from database in the preliminary locating process by using 1-D velocity model. In this study, we applied the double-difference tomography to investigate not only the fault geometry of the main shock but also the detailed 3-D velocity structure in this area. A total of 3,034 events were selected from preliminary locating result and CWBSN catalog in the vicinity. The resulting aftershocks are extended along the NE-SW direction and located on a 45° SE-dipping plane which agrees to one of the nodal planes of Global CMT solution (strike = 45°, dip = 40° and rake = 119°). We can identify a clear low-velocity area which is enclosed by events next to the main shock in the final 3D velocity model. We also recognized a 45°-dipping zone which is extended to the ground surface with low-velocity; meanwhile, velocity structure variation in study area correspond with major geologic units in Taiwan.
Designing pH induced fold switch in proteins
NASA Astrophysics Data System (ADS)
Baruah, Anupaul; Biswas, Parbati
2015-05-01
This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ensemble averaged characteristics of sequences that encode a fold switch. The primary challenge to balance the alternative sets of interactions present in both target structures is overcome by simultaneously optimizing two foldability criteria corresponding to two target structures. The change in pH is modeled by altering the residual charge on the amino acids. The energy landscape of the fold switch protein is found to be double funneled. The fold switch sequences stabilize the interactions of the sites with similar relative surface accessibility in both target structures. Fold switch sequences have low sequence complexity and hence lower sequence entropy. The pH induced fold switch is mediated by attractive electrostatic interactions rather than hydrophobic-hydrophobic contacts. This study may provide valuable insights to the design of fold switch proteins.
Recognition of Local DNA Structures by p53 Protein
Brázda, Václav; Coufal, Jan
2017-01-01
p53 plays critical roles in regulating cell cycle, apoptosis, senescence and metabolism and is commonly mutated in human cancer. These roles are achieved by interaction with other proteins, but particularly by interaction with DNA. As a transcription factor, p53 is well known to bind consensus target sequences in linear B-DNA. Recent findings indicate that p53 binds with higher affinity to target sequences that form cruciform DNA structure. Moreover, p53 binds very tightly to non-B DNA structures and local DNA structures are increasingly recognized to influence the activity of wild-type and mutant p53. Apart from cruciform structures, p53 binds to quadruplex DNA, triplex DNA, DNA loops, bulged DNA and hemicatenane DNA. In this review, we describe local DNA structures and summarize information about interactions of p53 with these structural DNA motifs. These recent data provide important insights into the complexity of the p53 pathway and the functional consequences of wild-type and mutant p53 activation in normal and tumor cells. PMID:28208646
Observations of Displacement-driven Maturation along a Subduction-Transform Edge Propagator Fault
NASA Astrophysics Data System (ADS)
Neely, J. S.; Furlong, K. P.
2016-12-01
The Solomon Islands-Vanuatu composite subduction zone represents a tectonically complex region along the Pacific-Australia plate boundary in the southwest Pacific Ocean. Here the Australia plate subducts under the Pacific plate in two parts - the Solomon Trench and the Vanuatu Trench - with the two segments separated by a transform fault produced by a tear in the approaching Australia plate. As a result of the Australia plate tearing, the two subducting sections are offset by the 280 km long San Cristobal Trough (SCT) transform fault, which acts as a Subduction-Transform Edge Propagator (STEP) fault. The formation of this transform fault provides an opportunity to study the evolution of a newly created transform plate boundary. As distance from the tear increases, both the magnitude and frequency of earthquakes along the transform increase reflecting the coalescence of fault segments into a through-going structure. Over the past few decades, there have been several instances of larger magnitude earthquakes migrating westward along the STEP through a rapid succession of events. A recent May 2015 sequence of MW 6.8, MW 6.9, and MW 6.8 earthquakes followed this pattern, with an east to west migration over three days. However, neither this 2015 sequence, nor a previous 1993 progression, ruptured into or nucleated a large earthquake within the region near the tear. SCT sequence termination outside the region of the newly formed fault occurs even though Coulomb Failure Stress analyses reveal that the tear end of the SCT is positively loaded for failure by the earthquake sequence. Changing seismicity patterns along the SCT are also mapped by b-value variations that correspond to the rupture patterns of these propagating sequences. These seismicity pattern changes along the SCT reveal a fault maturation process with strain localization driven by cumulative slip corresponding to approximately 80-100 km of displacement.
NASA Astrophysics Data System (ADS)
Dimitrievski, Martin; Goossens, Bart; Veelaert, Peter; Philips, Wilfried
2017-09-01
Understanding the 3D structure of the environment is advantageous for many tasks in the field of robotics and autonomous vehicles. From the robot's point of view, 3D perception is often formulated as a depth image reconstruction problem. In the literature, dense depth images are often recovered deterministically from stereo image disparities. Other systems use an expensive LiDAR sensor to produce accurate, but semi-sparse depth images. With the advent of deep learning there have also been attempts to estimate depth by only using monocular images. In this paper we combine the best of the two worlds, focusing on a combination of monocular images and low cost LiDAR point clouds. We explore the idea that very sparse depth information accurately captures the global scene structure while variations in image patches can be used to reconstruct local depth to a high resolution. The main contribution of this paper is a supervised learning depth reconstruction system based on a deep convolutional neural network. The network is trained on RGB image patches reinforced with sparse depth information and the output is a depth estimate for each pixel. Using image and point cloud data from the KITTI vision dataset we are able to learn a correspondence between local RGB information and local depth, while at the same time preserving the global scene structure. Our results are evaluated on sequences from the KITTI dataset and our own recordings using a low cost camera and LiDAR setup.
van der Leij, F R; Visser, R G; Ponstein, A S; Jacobsen, E; Feenstra, W J
1991-08-01
The genomic sequence of the potato gene for starch granule-bound starch synthase (GBSS; "waxy protein") has been determined for the wild-type allele of a monoploid genotype from which an amylose-free (amf) mutant was derived, and for the mutant part of the amf allele. Comparison of the wild-type sequence with a cDNA sequence from the literature and a newly isolated cDNA revealed the presence of 13 introns, the first of which is located in the untranslated leader. The promoter contains a G-box-like sequence. The deduced amino acid sequence of the precursor of GBSS shows a high degree of identity with monocot waxy protein sequences in the region corresponding to the mature form of the enzyme. The transit peptide of 77 amino acids, required for routing of the precursor to the plastids, shows much less identity with the transit peptides of the other waxy preproteins, but resembles the hydropathic distributions of these peptides. Alignment of the amino acid sequences of the four mature starch synthases with the Escherichia coli glgA gene product revealed the presence of at least three conserved boxes; there is no homology with previously proposed starch-binding domains of other enzymes involved in starch metabolism. We report the use of chimeric constructs with wild-type and amf sequences to localize, via complementation experiments, the region of the amf allele in which the mutation resides. Direct sequencing of polymerase chain reaction products confirmed that the amf mutation is a deletion of a single AT basepair in the region coding for the transit peptide.(ABSTRACT TRUNCATED AT 250 WORDS)
Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra
2017-07-01
This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
HARMONY: a server for the assessment of protein structures
Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.
2006-01-01
Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999
Precise auditory-vocal mirroring in neurons for learned vocal communication.
Prather, J F; Peters, S; Nowicki, S; Mooney, R
2008-01-17
Brain mechanisms for communication must establish a correspondence between sensory and motor codes used to represent the signal. One idea is that this correspondence is established at the level of single neurons that are active when the individual performs a particular gesture or observes a similar gesture performed by another individual. Although neurons that display a precise auditory-vocal correspondence could facilitate vocal communication, they have yet to be identified. Here we report that a certain class of neurons in the swamp sparrow forebrain displays a precise auditory-vocal correspondence. We show that these neurons respond in a temporally precise fashion to auditory presentation of certain note sequences in this songbird's repertoire and to similar note sequences in other birds' songs. These neurons display nearly identical patterns of activity when the bird sings the same sequence, and disrupting auditory feedback does not alter this singing-related activity, indicating it is motor in nature. Furthermore, these neurons innervate striatal structures important for song learning, raising the possibility that singing-related activity in these cells is compared to auditory feedback to guide vocal learning.
The complete nucleotide sequence of the glnALG operon of Escherichia coli K12.
Miranda-Ríos, J; Sánchez-Pescador, R; Urdea, M; Covarrubias, A A
1987-01-01
The nucleotide sequence of the E. coli glnALG operon has been determined. The glnL (ntrB) and glnG (ntrC) genes present a high homology, at the nucleotide and aminoacid levels, with the corresponding genes of Klebsiella pneumoniae. The predicted aminoacid sequence for glutamine synthetase allowed us to locate some of the enzyme domains. The structure of this operon is discussed. PMID:2882477
The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features
Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna
2005-01-01
Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...
Chiusano, M L; D'Onofrio, G; Alvarez-Valin, F; Jabbari, K; Colonna, G; Bernardi, G
1999-09-30
We investigated the relationships between the nucleotide substitution rates and the predicted secondary structures in the three states representation (alpha-helix, beta-sheet, and coil). The analysis was carried out on 34 alignments, each of which comprised sequences belonging to at least four different mammalian orders. The rates of synonymous substitution were found to be significantly different in regions predicted to be alpha-helix, beta-sheet, or coil. Likewise, the nonsynonymous rates also differ, although expectedly at a lower extent, in the three types of secondary structure, suggesting that different selective constraints associated with the different structures are affecting in a similar way the synonymous and nonsynonymous rates. Moreover, the base composition of the third codon positions is different in coding sequence regions corresponding to different secondary structures of proteins.
THGS: a web-based database of Transmembrane Helices in Genome Sequences
Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.
2004-01-01
Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Mapping HLA-A2, -A3 and -B7 supertype-restricted T-cell epitopes in the ebolavirus proteome.
Lim, Wan Ching; Khan, Asif M
2018-01-19
Ebolavirus (EBOV) is responsible for one of the most fatal diseases encountered by mankind. Cellular T-cell responses have been implicated to be important in providing protection against the virus. Antigenic variation can result in viral escape from immune recognition. Mapping targets of immune responses among the sequence of viral proteins is, thus, an important first step towards understanding the immune responses to viral variants and can aid in the identification of vaccine targets. Herein, we performed a large-scale, proteome-wide mapping and diversity analyses of putative HLA supertype-restricted T-cell epitopes of Zaire ebolavirus (ZEBOV), the most pathogenic species among the EBOV family. All publicly available ZEBOV sequences (14,098) for each of the nine viral proteins were retrieved, removed of irrelevant and duplicate sequences, and aligned. The overall proteome diversity of the non-redundant sequences was studied by use of Shannon's entropy. The sequences were predicted, by use of the NetCTLpan server, for HLA-A2, -A3, and -B7 supertype-restricted epitopes, which are relevant to African and other ethnicities and provide for large (~86%) population coverage. The predicted epitopes were mapped to the alignment of each protein for analyses of antigenic sequence diversity and relevance to structure and function. The putative epitopes were validated by comparison with experimentally confirmed epitopes. ZEBOV proteome was generally conserved, with an average entropy of 0.16. The 185 HLA supertype-restricted T-cell epitopes predicted (82 (A2), 37 (A3) and 66 (B7)) mapped to 125 alignment positions and covered ~24% of the proteome length. Many of the epitopes showed a propensity to co-localize at select positions of the alignment. Thirty (30) of the mapped positions were completely conserved and may be attractive for vaccine design. The remaining (95) positions had one or more epitopes, with or without non-epitope variants. A significant number (24) of the putative epitopes matched reported experimentally validated HLA ligands/T-cell epitopes of A2, A3 and/or B7 supertype representative allele restrictions. The epitopes generally corresponded to functional motifs/domains and there was no correlation to localization on the protein 3D structure. These data and the epitope map provide important insights into the interaction between EBOV and the host immune system.
Encoding the structure of many-body localization with matrix product operators
NASA Astrophysics Data System (ADS)
Pekker, David; Clark, Bryan K.
2015-03-01
Anderson insulators are non-interacting disordered systems which have localized single particle eigenstates. The interacting analogue of Anderson insulators are the Many-Body Localized (MBL) phases. The natural language for representing the spectrum of the Anderson insulator is that of product states over the single-particle modes. We show that product states over Matrix Product Operators of small bond dimension is the corresponding natural language for describing the MBL phases. In this language all of the many-body eigenstates are encode by Matrix Product States (i.e. DMRG wave function) consisting of only two sets of low bond-dimension matrices per site: the Gi matrix corresponding to the local ground state on site i and the Ei matrix corresponding to the local excited state. All 2 n eigenstates can be generated from all possible combinations of these matrices.
Automated antibody structure prediction using Accelrys tools: Results and best practices
Fasnacht, Marc; Butenhof, Ken; Goupil-Lamy, Anne; Hernandez-Guzman, Francisco; Huang, Hongwei; Yan, Lisa
2014-01-01
We describe the methodology and results from our participation in the second Antibody Modeling Assessment experiment. During the experiment we predicted the structure of eleven unpublished antibody Fv fragments. Our prediction methods centered on template-based modeling; potential templates were selected from an antibody database based on their sequence similarity to the target in the framework regions. Depending on the quality of the templates, we constructed models of the antibody framework regions either using a single, chimeric or multiple template approach. The hypervariable loop regions in the initial models were rebuilt by grafting the corresponding regions from suitable templates onto the model. For the H3 loop region, we further refined models using ab initio methods. The final models were subjected to constrained energy minimization to resolve severe local structural problems. The analysis of the models submitted show that Accelrys tools allow for the construction of quite accurate models for the framework and the canonical CDR regions, with RMSDs to the X-ray structure on average below 1 Å for most of these regions. The results show that accurate prediction of the H3 hypervariable loops remains a challenge. Furthermore, model quality assessment of the submitted models show that the models are of quite high quality, with local geometry assessment scores similar to that of the target X-ray structures. Proteins 2014; 82:1583–1598. © 2014 The Authors. Proteins published by Wiley Periodicals, Inc. PMID:24833271
An approach to large scale identification of non-obvious structural similarities between proteins
Cherkasov, Artem; Jones, Steven JM
2004-01-01
Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578
Bastien, Olivier; Maréchal, Eric
2008-08-07
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas
2018-04-01
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pichon, L.; Carn, G.; Bouric, P.
1996-03-01
Positional cloning strategies for the hemochromatosis gene have previously concentrated on a target area restricted to a maximum genomic expanse of 400 kb around the HLA-A and HLA-F loci. Recently, the candidate region has been extended to 2-3 Mb on the distal side of the MHC. In this study, 10 coding sequences [hemochromatosis candidate genes (HCG) I to X] were isolated by cDNA selection using YACs covering the HLA-A/HLA-F subregion. Two of these (HCG II and HCG IV) belong to multigene families, as well as other sequences already described in this region, i.e., P5, pMC 6.7, and HLA class I.more » Fingerprinting of the four YACSs overlapping the region was performed and allowed partial localization of the different multigene family sequences on each YAC without defining their exact positions. Fingerprinting on cosmids isolated from the ICRF chromosome 6-specific cosmid library allowed more precise localization of the redundant sequences in all of the multigene families and revealed their apparent organization in clusters. Further examination of these intertwined sequences demonstrated that this structural organization resulted from a succession of complex phenomena, including duplications and contractions. This study presents a precise description of the structural organization of the HLA-A/HLA-F region and a determination of the sequences involved in the megabase size polymorphism observed among the A3, A24, and A31 haplotypes. 29 refs., 2 figs., 2 tabs.« less
SeqHound: biological sequence and structure database as a platform for bioinformatics research
2002-01-01
Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
Sá-Carvalho, D; Traub-Cseko, Y M
1995-06-01
Naturally occurring sequences containing repetitive guanine motifs have the potential to form tetraplex DNA. Phytomonas serpens minicircle DNA shows some regions where one strand is composed mainly of G and T (GT regions). These regions contain several stretches of contiguous guanines. An oligonucleotide was constructed with the sequence corresponding to one of these regions (Phyto-GT). It was demonstrated by native gel electrophoresis and methylation protection that Phyto-GT forms tetramolecular (G4), bimolecular (G'2) and unimolecular (G4') structures stabilized through G-quartets. Tetraplex DNA formation by this sequence could have biological relevance as it can be formed in physiological conditions and GT regions comprise approximately one-third of P. serpens and Crithidia oncopelti minicircles.
Deciphering the shape and deformation of secondary structures through local conformation analysis
2011-01-01
Background Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Results Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. Conclusion The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons. PMID:21284872
Deciphering the shape and deformation of secondary structures through local conformation analysis.
Baussand, Julie; Camproux, Anne-Claude
2011-02-01
Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
Hilbert space structure in quantum gravity: an algebraic perspective
Giddings, Steven B.
2015-12-16
If quantum gravity respects the principles of quantum mechanics, suitably generalized, it may be that a more viable approach to the theory is through identifying the relevant quantum structures rather than by quantizing classical spacetime. Here, this viewpoint is supported by difficulties of such quantization, and by the apparent lack of a fundamental role for locality. In finite or discrete quantum systems, important structure is provided by tensor factorizations of the Hilbert space. However, even in local quantum field theory properties of the generic type III von Neumann algebras and of long range gauge fields indicate that factorization of themore » Hilbert space is problematic. Instead it is better to focus on the structure of the algebra of observables, and in particular on its subalgebras corresponding to regions. This paper suggests that study of analogous algebraic structure in gravity gives an important perspective on the nature of the quantum theory. Significant departures from the subalgebra structure of local quantum field theory are found, working in the correspondence limit of long-distances/low-energies. Particularly, there are obstacles to identifying commuting algebras of localized operators. In addition to suggesting important properties of the algebraic structure, this and related observations pose challenges to proposals of a fundamental role for entanglement.« less
Hilbert space structure in quantum gravity: an algebraic perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giddings, Steven B.
If quantum gravity respects the principles of quantum mechanics, suitably generalized, it may be that a more viable approach to the theory is through identifying the relevant quantum structures rather than by quantizing classical spacetime. Here, this viewpoint is supported by difficulties of such quantization, and by the apparent lack of a fundamental role for locality. In finite or discrete quantum systems, important structure is provided by tensor factorizations of the Hilbert space. However, even in local quantum field theory properties of the generic type III von Neumann algebras and of long range gauge fields indicate that factorization of themore » Hilbert space is problematic. Instead it is better to focus on the structure of the algebra of observables, and in particular on its subalgebras corresponding to regions. This paper suggests that study of analogous algebraic structure in gravity gives an important perspective on the nature of the quantum theory. Significant departures from the subalgebra structure of local quantum field theory are found, working in the correspondence limit of long-distances/low-energies. Particularly, there are obstacles to identifying commuting algebras of localized operators. In addition to suggesting important properties of the algebraic structure, this and related observations pose challenges to proposals of a fundamental role for entanglement.« less
Heffernan, Rhys; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi
2017-09-15
The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10-20 amino acid residues to capture some 'short to intermediate' non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window. We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| >19) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone ϕ , ψ , θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted C α atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by ϕ and ψ angles. We expect the method to be useful for assisting protein structure and function prediction. The method is available as a SPIDER3 server and standalone package at http://sparks-lab.org . yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.
2016-01-01
ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found that the amount of sequence variation in an rRNA gene that is commonly used to identify AM fungal species varied significantly between evolutionary groups that correspond to different genera, with the exception of two genera that are genetically indistinguishable from each other. When we clustered groups of similar sequences into operational taxonomic units (OTUs) using five different clustering methods, these patterns of sequence variation caused the number of OTUs to either over- or underestimate the actual number of AM fungal species, depending on the genus. Our results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. PMID:27260357
Roux-Rouquie, M; Marilley, M
2000-09-15
We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Structural determinants of nuclear export signal orientation in binding to exportin CRM1
Fung, Ho Yee Joyce; Fu, Szu -Chin; Brautigam, Chad A.; ...
2015-09-08
The Chromosome Region of Maintenance 1 (CRM1) protein mediates nuclear export of hundreds of proteins through recognition of their nuclear export signals (NESs), which are highly variable in sequence and structure. The plasticity of the CRM1-NES interaction is not well understood, as there are many NES sequences that seem incompatible with structures of the NES-bound CRM1 groove. Crystal structures of CRM1 bound to two different NESs with unusual sequences showed the NES peptides binding the CRM1 groove in the opposite orientation (minus) to that of previously studied NESs (plus). A comparison of minus and plus NESs identified structural and sequencemore » determinants for NES orientation. The binding of NESs to CRM1 in both orientations results in a large expansion in NES consensus patterns and therefore a corresponding expansion of potential NESs in the proteome.« less
Hyder, S M; Stancel, G M; Nawaz, Z; McDonnell, D P; Loose-Mitchell, D S
1992-09-05
We have used transient transfection assays with reporter plasmids expressing chloramphenicol acetyltransferase, linked to regions of mouse c-fos, to identify a specific estrogen response element (ERE) in this protooncogene. This element is located in the untranslated 3'-flanking region of the c-fos gene, 5 kilobases (kb) downstream from the c-fos promoter and 1.5 kb downstream of the poly(A) signal. This element confers estrogen responsiveness to chloramphenicol acetyltransferase reporters linked to both the herpes simplex virus thymidine kinase promoter and the homologous c-fos promoter. Deletion analysis localized the response element to a 200-base pair fragment which contains the element GGTCACCACAGCC that resembles the consensus ERE sequence GGTCACAGTGACC originally identified in Xenopus vitellogenin A2 gene. A synthetic 36-base pair oligodeoxynucleotide containing this c-fos sequence conferred estrogen inducibility to the thymidine kinase promoter. The corresponding sequence also induced reporter activity when present in the c-fos gene fragment 3 kb from the thymidine kinase promoter. Gel-shift experiments demonstrated that synthetic oligonucleotides containing either the consensus ERE or the c-fos element bind human estrogen receptor obtained from a yeast expression system. However, the mobility of the shifted band is faster for the fos-ERE-complex than the consensus ERE complex suggesting that the three-dimensional structure of the protein-DNA complexes is different or that other factors are differentially involved in the two reactions. When the 5'-GGTCA sequence present in the c-fos ERE is mutated to 5'-TTTCA, transcriptional activation and receptor binding activities are both lost. Mutation of the CAGCC-3' element corresponding to the second half-site of the c-fos sequence also led to the loss of receptor binding activity, suggesting that both half-sites of this element are involved in this function. The estrogen induction mediated by either the c-fos or the consensus ERE was blunted by the antiestrogen tamoxifen. Based on these studies, we believe the 3'-fos ERE sequence we have identified may be a major cis-acting element involved in the physiological regulation of the gene by estrogens in vivo.
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.
Tong, Jing; Pei, Jimin; Grishin, Nick V
2015-09-03
Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Encoding the structure of many-body localization with matrix product operators
NASA Astrophysics Data System (ADS)
Pekker, David; Clark, Bryan K.
2017-01-01
Anderson insulators are noninteracting disordered systems which have localized single-particle eigenstates. The interacting analog of Anderson insulators are the many-body localized (MBL) phases. The spectrum of the many-body eigenstates of an Anderson insulator is efficiently represented as a set of product states over the single-particle modes. We show that product states over matrix product operators of small bond dimension is the corresponding efficient description of the spectrum of an MBL insulator. In this language all of the many-body eigenstates are encoded by matrix product states (i.e., density matrix renormalization group wave functions) consisting of only two sets of low bond dimension matrices per site: the Gi matrices corresponding to the local ground state on site i and the Ei matrices corresponding to the local excited state. All 2n eigenstates can be generated from all possible combinations of these sets of matrices.
Self-similar pyramidal structures and signal reconstruction
NASA Astrophysics Data System (ADS)
Benedetto, John J.; Leon, Manuel; Saliani, Sandra
1998-03-01
Pyramidal structures are defined which are locally a combination of low and highpass filtering. The structures are analogous to but different from wavelet packet structures. In particular, new frequency decompositions are obtained; and these decompositions can be parameterized to establish a correspondence with a large class of Cantor sets. Further correspondences are then established to relate such frequency decompositions with more general self- similarities. The role of the filters in defining these pyramidal structures gives rise to signal reconstruction algorithms, and these, in turn, are used in the analysis of speech data.
Sequencing proteins with transverse ionic transport in nanochannels.
Boynton, Paul; Di Ventra, Massimiliano
2016-05-03
De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.
Tao, Junjie; Feng, Chao; Ai, Bin; Kang, Ming
2016-01-01
Background and Aims Limestone karst areas possess high floral diversity and endemism. The genus Primulina, which contributes to the unique calcicole flora, has high species richness and exhibit specific soil-based habitat associations that are mainly distributed on calcareous karst soils. The adaptive molecular evolutionary mechanism of the genus to karst calcium-rich environments is still not well understood. The Ca2+-permeable channel TPC1 was used in this study to test whether its gene is involved in the local adaptation of Primulina to karst high-calcium soil environments. Methods Specific amplification and sequencing primers were designed and used to amplify the full-length coding sequences of TPC1 from cDNA of 76 Primulina species. The sequence alignment without recombination and the corresponding reconstructed phylogeny tree were used in molecular evolutionary analyses at the nucleic acid level and amino acid level, respectively. Finally, the identified sites under positive selection were labelled on the predicted secondary structure of TPC1. Key Results Seventy-six full-length coding sequences of Primulina TPC1 were obtained. The length of the sequences varied between 2220 and 2286 bp and the insertion/deletion was located at the 5′ end of the sequences. No signal of substitution saturation was detected in the sequences, while significant recombination breakpoints were detected. The molecular evolutionary analyses showed that TPC1 was dominated by purifying selection and the selective pressures were not significantly different among species lineages. However, significant signals of positive selection were detected at both TPC1 codon level and amino acid level, and five sites under positive selective pressure were identified by at least three different methods. Conclusions The Ca2+-permeable channel TPC1 may be involved in the local adaptation of Primulina to karst Ca2+-rich environments. Different species lineages suffered similar selective pressure associated with calcium in karst environments, and episodic diversifying selection at a few sites may play a major role in the molecular evolution of Primulina TPC1. PMID:27582362
Li, Ying; Shi, Xiaohu; Liang, Yanchun; Xie, Juan; Zhang, Yu; Ma, Qin
2017-01-21
RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/ .
Lin, H; Rao, V B; Black, L W
1999-06-04
Bacteriophage DNA packaging results from an ATP-driven translocation of concatemeric DNA into the prohead by the phage terminase complexed with the portal vertex dodecamer of the prohead. Functional domains of the bacteriophage T4 terminase and portal gene 20 product (gp20) were determined by mutant analysis and sequence localization within the structural genes. Interaction regions of the portal vertex and large terminase subunit (gp17) were determined by genetic (terminase-portal intergenic suppressor mutations), biochemical (column retention of gp17 and inhibition of in vitro DNA packaging by gp20 peptides), and immunological (co-immunoprecipitation of polymerized gp20 peptide and gp17) studies. The specificity of the interaction was tested by means of a phage T4 HOC (highly antigenicoutercapsid protein) display system in which wild-type, cs20, and scrambled portal peptide sequences were displayed on the HOC protein of phage T4. Binding affinities of these recombinant phages as determined by the retention of these phages by a His-tag immobilized gp17 column, and by co-immunoprecipitation with purified terminase supported the specific nature of the portal protein and terminase interaction sites. In further support of specificity, a gp20 peptide corresponding to a portion of the identified site inhibited packaging whereas the scrambled sequence peptide did not block DNA packaging in vitro. The portal interaction site is localized to 28 residues in the central portion of the linear sequence of gp20 (524 residues). As judged by two pairs of intergenic portal-terminase suppressor mutations, two separate regions of the terminase large subunit gp17 (central and COOH-terminal) interact through hydrophobic contacts at the portal site. Although the terminase apparently interacts with this gp20 portal peptide, polyclonal antibody against the portal peptide appears unable to access it in the native structure, suggesting intimate association of gp20 and gp17 possibly internalizes terminase regions within the portal in the packasome complex. Both similarities and differences are seen in comparison to analogous sites which have been identified in phages T3 and lambda. Copyright 1999 Academic Press.
Placental fetal stem segmentation in a sequence of histology images
NASA Astrophysics Data System (ADS)
Athavale, Prashant; Vese, Luminita A.
2012-02-01
Recent research in perinatal pathology argues that analyzing properties of the placenta may reveal important information on how certain diseases progress. One important property is the structure of the placental fetal stems. Analysis of the fetal stems in a placenta could be useful in the study and diagnosis of some diseases like autism. To study the fetal stem structure effectively, we need to automatically and accurately track fetal stems through a sequence of digitized hematoxylin and eosin (H&E) stained histology slides. There are many problems in successfully achieving this goal. A few of the problems are: large size of images, misalignment of the consecutive H&E slides, unpredictable inaccuracies of manual tracing, very complicated texture patterns of various tissue types without clear characteristics, just to name a few. In this paper we propose a novel algorithm to achieve automatic tracing of the fetal stem in a sequence of H&E images, based on an inaccurate manual segmentation of a fetal stem in one of the images. This algorithm combines global affine registration, local non-affine registration and a novel 'dynamic' version of the active contours model without edges. We first use global affine image registration of all the images based on displacement, scaling and rotation. This gives us approximate location of the corresponding fetal stem in the image that needs to be traced. We then use the affine registration algorithm "locally" near this location. At this point, we use a fast non-affine registration based on L2-similarity measure and diffusion regularization to get a better location of the fetal stem. Finally, we have to take into account inaccuracies in the initial tracing. This is achieved through a novel dynamic version of the active contours model without edges where the coefficients of the fitting terms are computed iteratively to ensure that we obtain a unique stem in the segmentation. The segmentation thus obtained can then be used as an initial guess to obtain segmentation in the rest of the images in the sequence. This constitutes an important step in the extraction and understanding of the fetal stem vasculature.
Wang, L; Eriksson, S
2000-01-01
The subcellular localization of mitochondrial thymidine kinase (TK2) has been questioned, since no mitochondrial targeting sequences have been found in cloned human TK2 cDNAs. Here we report the cloning of mouse TK2 cDNA from a mouse full-length enriched cDNA library. The mouse TK2 cDNA codes for a protein of 270 amino acids, with a 40-amino-acid presumed N-terminal mitochondrial targeting signal. In vitro translation and translocation experiments with purified rat mitochondria confirmed that the N-terminal sequence directed import of the precursor TK2 into the mitochondrial matrix. A single 2.4 kb mRNA transcript was detected in most tissues examined, except in liver, where an additional shorter (1.0 kb) transcript was also observed. There was no correlation between the tissue distribution of TK2 activity and the expression of TK2 mRNA. Full-length mouse TK2 protein and two N-terminally truncated forms, one of which corresponds to the mitochondrial form of TK2 and a shorter form corresponding to the previously characterized recombinant human TK2, were expressed in Escherichia coli and affinity purified. All three forms of TK2 phosphorylated thymidine, deoxycytidine and 2'-deoxyuridine, but with different kinetic efficiencies. A number of cytostatic pyrimidine nucleoside analogues were also tested and shown to be good substrates for the various forms of TK2. The active form of full-length mouse TK2 was a dimer, as judged by Superdex 200 chromatography. These results enhance our understanding of the structure and function of TK2, and may help to explain the mitochondrial disorder, mitochondrial neurogastrointestinal encephalomyopathy. PMID:11023833
NASA Astrophysics Data System (ADS)
Benito, S.; Ferrer, A.; Benabou, S.; Aviñó, A.; Eritja, R.; Gargallo, R.
2018-05-01
Guanine-rich sequences may fold into highly ordered structures known as G-quadruplexes. Apart from the monomeric G-quadruplex, these sequences may form multimeric structures that are not usually considered when studying interaction with ligands. This work studies the interaction of a ligand, crystal violet, with three guanine-rich DNA sequences with the capacity to form multimeric structures. These sequences correspond to short stretches found near the promoter regions of c-kit and SMARCA4 genes. Instrumental techniques (circular dichroism, molecular fluorescence, size-exclusion chromatography and electrospray ionization mass spectrometry) and multivariate data analysis were used for this purpose. The polymorphism of G-quadruplexes was characterized prior to the interaction studies. The ligand was shown to interact preferentially with the monomeric G-quadruplex; the binding stoichiometry was 1:1 and the binding constant was in the order of 105 M-1 for all three sequences. The results highlight the importance of DNA treatment prior to interaction studies.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Ferrocene-oligonucleotide conjugates for electrochemical probing of DNA.
Ihara, T; Maruo, Y; Takenaka, S; Takagi, M
1996-01-01
Toward the development of a universal, sensitive and convenient method of DNA (or RNA) detection, electrochemically active oligonucleotides were prepared by covalent linkage of a ferrocenyl group to the 5'-aminohexyl-terminated synthetic oligonucleotides. Using these electrochemically active probes, we have been able to demonstrate the detection of DNA and RNA at femtomole levels by HPLC equipped with an ordinary electrochemical detector (ECD) [Takenaka,S., Uto,Y., Kondo,H., Ihara,T. and Takagi,M. (1994) Anal. Biochem., 218, 436-443]. Thermodynamic and electrochemical studies of the interaction between the probes and the targets are presented here. The thermodynamics obtained revealed that the conjugation stabilizes the triple-helix complexes by 2-3 kcal mol-1 (1-2 orders increment in binding constant) at 298 K, which corresponds to the effect of elongation of additional several base triplets. The main cause of this thermodynamic stabilization by the conjugation is likely to be the overall conformational change of whole structure of the conjugate rather than the additional local interaction. The redox potential of the probe was independent of the target structure, which is either single- or double stranded. However, the potential is slightly dependent (with a 10-30 mV negative shift on complexation) on the extra sequence in the target, probably because the individual sequence is capable of contacting or interacting with the ferrocenyl group in a slightly different way from each other. This small potential shift itself, however, does not cause any inconvenience on practical applications in detecting the probes by using ECD. These results lead to the conclusion that the redox-active probes are very useful for the microanalysis of nucleic acids due to the stability of the complexes, high detection sensitivity and wide applicability to the target structures (DNA and RNA; single- and double strands) and the sequences. PMID:8932383
The predicted secondary structures of class I fructose-bisphosphate aldolases.
Sawyer, L; Fothergill-Gilmore, L A; Freemont, P S
1988-01-01
The results of several secondary-structure prediction programs were combined to produce an estimate of the regions of alpha-helix, beta-sheet and reverse turns for fructose-bisphosphate aldolases from human and rat muscle and liver, from Trypanosoma brucei and from Drosophila melanogaster. All the aldolase sequences gave essentially the same pattern of secondary-structure predictions despite having sequences up to 50% different. One exception to this pattern was an additional strongly predicted helix in the rat liver and Drosophila enzymes. Regions of relatively high sequence variation generally were predicted as reverse turns, and probably occur as surface loops. Most of the positions corresponding to exon boundaries are located between regions predicted to have secondary-structural elements consistent with a compact structure. The predominantly alternating alpha/beta structure predicted is consistent with the alpha/beta-barrel structure indicated by preliminary high-resolution X-ray diffraction studies on rabbit muscle aldolase [Sygusch, Beaudry & Allaire (1986) Biophys. J. 49, 287a]. Images Fig. 1. (cont.) Fig. 1. PMID:3128269
NASA Astrophysics Data System (ADS)
Latinovic, T. S.; Kalabic, S. B.; Barz, C. R.; Petrica, P. Paul; Pop-Vădean, A.
2018-01-01
This paper analyzes the influence of the Doppler Effect on the length of time to establish synchronization pseudorandom sequences in radio communications systems with an expanded spectrum. Also, this paper explores the possibility of using secure wireless communication for modular robots. Wireless communication could be used for local and global communication. We analyzed a radio communication system integrator, including the two effects of the Doppler signal on the duration of establishing synchronization of the received and locally generated pseudorandom sequence. The effects of the impact of the variability of the phase were analyzed between the said sequences and correspondence of the phases of these signals with the interval of time of acquisition of received sequences. An analysis of these impacts is essential in the transmission of signal and protection of the transfer of information in the communication systems with an expanded range (telecommunications, mobile telephony, Global Navigation Satellite System GNSS, and wireless communication). Results show that wireless communication can provide a safety approach for communication with mobile robots.
Molecular dynamics study of some non-hydrogen-bonding base pair DNA strands
NASA Astrophysics Data System (ADS)
Tiwari, Rakesh K.; Ojha, Rajendra P.; Tiwari, Gargi; Pandey, Vishnudatt; Mall, Vijaysree
2018-05-01
In order to elucidate the structural activity of hydrophobic modified DNA, the DMMO2-D5SICS, base pair is introduced as a constituent in different set of 12-mer and 14-mer DNA sequences for the molecular dynamics (MD) simulation in explicit water solvent. AMBER 14 force field was employed for each set of duplex during the 200ns production-dynamics simulation in orthogonal-box-water solvent by the Particle-Mesh-Ewald (PME) method in infinite periodic boundary conditions (PBC) to determine conformational parameters of the complex. The force-field parameters of modified base-pair were calculated by Gaussian-code using Hartree-Fock /ab-initio methodology. RMSD Results reveal that the conformation of the duplex is sequence dependent and the binding energy of the complex depends on the position of the modified base-pair in the nucleic acid strand. We found that non-bonding energy had a significant contribution to stabilising such type of duplex in comparison to electrostatic energy. The distortion produced within strands by such type of base-pair was local and destabilised the duplex integrity near to substitution, moreover the binding energy of duplex depends on the position of substitution of hydrophobic base-pair and the DNA sequence and strongly supports the corresponding experimental study.
Protein linguistics - a grammar for modular protein assembly?
Gimona, Mario
2006-01-01
The correspondence between biology and linguistics at the level of sequence and lexical inventories, and of structure and syntax, has fuelled attempts to describe genome structure by the rules of formal linguistics. But how can we define protein linguistic rules? And how could compositional semantics improve our understanding of protein organization and functional plasticity?
Pressure-induced structural transformations and polymerization in ThC2
Guo, Yongliang; Yu, Cun; Lin, Jun; Wang, Changying; Ren, Cuilan; Sun, Baoxing; Huai, Ping; Xie, Ruobing; Ke, Xuezhi; Zhu, Zhiyuan; Xu, Hongjie
2017-01-01
Thorium-carbon systems have been thought as promising nuclear fuel for Generation IV reactors which require high-burnup and safe nuclear fuel. Existing knowledge on thorium carbides under extreme condition remains insufficient and some is controversial due to limited studies. Here we systematically predict all stable structures of thorium dicarbide (ThC2) under the pressure ranging from ambient to 300 GPa by merging ab initio total energy calculations and unbiased structure searching method, which are in sequence of C2/c, C2/m, Cmmm, Immm and P6/mmm phases. Among these phases, the C2/m is successfully observed for the first time via in situ synchrotron XRD measurements, which exhibits an excellent structural correspondence to our theoretical predictions. The transition sequence and the critical pressures are predicted. The calculated results also reveal the polymerization behaviors of the carbon atoms and the corresponding characteristic C-C bonding under various pressures. Our work provides key information on the fundamental material behavior and insights into the underlying mechanisms that lay the foundation for further exploration and application of ThC2. PMID:28383571
Pressure-induced structural transformations and polymerization in ThC2
NASA Astrophysics Data System (ADS)
Guo, Yongliang; Yu, Cun; Lin, Jun; Wang, Changying; Ren, Cuilan; Sun, Baoxing; Huai, Ping; Xie, Ruobing; Ke, Xuezhi; Zhu, Zhiyuan; Xu, Hongjie
2017-04-01
Thorium-carbon systems have been thought as promising nuclear fuel for Generation IV reactors which require high-burnup and safe nuclear fuel. Existing knowledge on thorium carbides under extreme condition remains insufficient and some is controversial due to limited studies. Here we systematically predict all stable structures of thorium dicarbide (ThC2) under the pressure ranging from ambient to 300 GPa by merging ab initio total energy calculations and unbiased structure searching method, which are in sequence of C2/c, C2/m, Cmmm, Immm and P6/mmm phases. Among these phases, the C2/m is successfully observed for the first time via in situ synchrotron XRD measurements, which exhibits an excellent structural correspondence to our theoretical predictions. The transition sequence and the critical pressures are predicted. The calculated results also reveal the polymerization behaviors of the carbon atoms and the corresponding characteristic C-C bonding under various pressures. Our work provides key information on the fundamental material behavior and insights into the underlying mechanisms that lay the foundation for further exploration and application of ThC2.
Pressure-induced structural transformations and polymerization in ThC2.
Guo, Yongliang; Yu, Cun; Lin, Jun; Wang, Changying; Ren, Cuilan; Sun, Baoxing; Huai, Ping; Xie, Ruobing; Ke, Xuezhi; Zhu, Zhiyuan; Xu, Hongjie
2017-04-06
Thorium-carbon systems have been thought as promising nuclear fuel for Generation IV reactors which require high-burnup and safe nuclear fuel. Existing knowledge on thorium carbides under extreme condition remains insufficient and some is controversial due to limited studies. Here we systematically predict all stable structures of thorium dicarbide (ThC 2 ) under the pressure ranging from ambient to 300 GPa by merging ab initio total energy calculations and unbiased structure searching method, which are in sequence of C2/c, C2/m, Cmmm, Immm and P6/mmm phases. Among these phases, the C2/m is successfully observed for the first time via in situ synchrotron XRD measurements, which exhibits an excellent structural correspondence to our theoretical predictions. The transition sequence and the critical pressures are predicted. The calculated results also reveal the polymerization behaviors of the carbon atoms and the corresponding characteristic C-C bonding under various pressures. Our work provides key information on the fundamental material behavior and insights into the underlying mechanisms that lay the foundation for further exploration and application of ThC 2 .
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila
2010-07-16
Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
The right inferior frontal gyrus processes nested non-local dependencies in music.
Cheung, Vincent K M; Meyer, Lars; Friederici, Angela D; Koelsch, Stefan
2018-02-28
Complex auditory sequences known as music have often been described as hierarchically structured. This permits the existence of non-local dependencies, which relate elements of a sequence beyond their temporal sequential order. Previous studies in music have reported differential activity in the inferior frontal gyrus (IFG) when comparing regular and irregular chord-transitions based on theories in Western tonal harmony. However, it is unclear if the observed activity reflects the interpretation of hierarchical structure as the effects are confounded by local irregularity. Using functional magnetic resonance imaging (fMRI), we found that violations to non-local dependencies in nested sequences of three-tone musical motifs in musicians elicited increased activity in the right IFG. This is in contrast to similar studies in language which typically report the left IFG in processing grammatical syntax. Effects of increasing auditory working demands are moreover reflected by distributed activity in frontal and parietal regions. Our study therefore demonstrates the role of the right IFG in processing non-local dependencies in music, and suggests that hierarchical processing in different cognitive domains relies on similar mechanisms that are subserved by domain-selective neuronal subpopulations.
Parallel algorithm for determining motion vectors in ice floe images by matching edge features
NASA Technical Reports Server (NTRS)
Manohar, M.; Ramapriyan, H. K.; Strong, J. P.
1988-01-01
A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.
Identification of the genomic locus for the human Rieske Fe-S Protein gene on Chromosome 19q12
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pennacchio, L.A.
1994-05-06
We have identified the chromosomal location of the human Rieske Iron-Sulfur Protein (UQCRFS1) gene. Mapping by hybridization to a panel of monochromosomal hybrid cell lines indicated that the gene was either on chromosome 19 or 22. By screening a human chromosome 19 specific genomic cosmid library with an oligonucleotide probe made from the published Rieske cDNA sequence, we identified a corresponding cosmid. Portions of this cosmid were sequenced directly. The exon, exon:intron junction, and flanking sequences verified that this cosmid contains the genomic locus. Fluorescent in situ hybridization (FISH) was performed to localize this cosmid to chromosome band 19q12.
Brunak, S; Engelbrecht, J
1996-06-01
A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.
Description of 3D digital curves using the theory free groups
NASA Astrophysics Data System (ADS)
Imiya, Atsushi; Oosawa, Muneaki
1999-09-01
In this paper, we propose a new descriptor for two- and three- dimensional digital curves using the theory of free groups. A spatial digital curve is expressed as a word which is an element of the free group which consists from three elements. These three symbols correspond to the directions of the orthogonal coordinates, respectively. Since a digital curve is treated as a word which is a sequence of alphabetical symbols, this expression permits us to describe any geometric operation as rewriting rules for words. Furthermore, the symbolic derivative of words yields geometric invariants of digital curves for digital Euclidean motion. These invariants enable us to design algorithms for the matching and searching procedures of partial structures of digital curves. Moreover, these symbolic descriptors define the global and local distances for digital curves as an editing distance.
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?
Sridhar, Settu; Guruprasad, Kunchur
2014-01-01
We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.
Scop3D: three-dimensional visualization of sequence conservation.
Vermeire, Tessa; Vermaere, Stijn; Schepens, Bert; Saelens, Xavier; Van Gucht, Steven; Martens, Lennart; Vandermarliere, Elien
2015-04-01
The integration of a protein's structure with its known sequence variation provides insight on how that protein evolves, for instance in terms of (changing) function or immunogenicity. Yet, collating the corresponding sequence variants into a multiple sequence alignment, calculating each position's conservation, and mapping this information back onto a relevant structure is not straightforward. We therefore built the Sequence Conservation on Protein 3D structure (scop3D) tool to perform these tasks automatically. The output consists of two modified PDB files in which the B-values for each position are replaced by the percentage sequence conservation, or the information entropy for each position, respectively. Furthermore, text files with absolute and relative amino acid occurrences for each position are also provided, along with snapshots of the protein from six distinct directions in space. The visualization provided by scop3D can for instance be used as an aid in vaccine development or to identify antigenic hotspots, which we here demonstrate based on an analysis of the fusion proteins of human respiratory syncytial virus and mumps virus. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hartmann, Fanny E; Rodríguez de la Vega, Ricardo C; Brandenburg, Jean-Tristan; Carpentier, Fantin; Giraud, Tatiana
2018-04-01
Gene presence-absence polymorphisms segregating within species are a significant source of genetic variation but have been little investigated to date in natural populations. In plant pathogens, the gain or loss of genes encoding proteins interacting directly with the host, such as secreted proteins, probably plays an important role in coevolution and local adaptation. We investigated gene presence-absence polymorphism in populations of two closely related species of castrating anther-smut fungi, Microbotryum lychnidis-dioicae (MvSl) and M. silenes-dioicae (MvSd), from across Europe, on the basis of Illumina genome sequencing data and high-quality genome references. We observed presence-absence polymorphism for 186 autosomal genes (2% of all genes) in MvSl, and only 51 autosomal genes in MvSd. Distinct genes displayed presence-absence polymorphism in the two species. Genes displaying presence-absence polymorphism were frequently located in subtelomeric and centromeric regions and close to repetitive elements, and comparison with outgroups indicated that most were present in a single species, being recently acquired through duplications in multiple-gene families. Gene presence-absence polymorphism in MvSl showed a phylogeographic structure corresponding to clusters detected based on SNPs. In addition, gene absence alleles were rare within species and skewed toward low-frequency variants. These findings are consistent with a deleterious or neutral effect for most gene presence-absence polymorphism. Some of the observed gene loss and gain events may however be adaptive, as suggested by the putative functions of the corresponding encoded proteins (e.g., secreted proteins) or their localization within previously identified selective sweeps. The adaptive roles in plant and anther-smut fungi interactions of candidate genes however need to be experimentally tested in future studies.
New insights into chromatin folding and dynamics from multi-scale modeling
NASA Astrophysics Data System (ADS)
Olson, Wilma
The dynamic organization of chromatin plays an essential role in the regulation of gene expression and in other fundamental cellular processes. The underlying physical basis of these activities lies in the sequential positioning, chemical composition, and intermolecular interactions of the nucleosomes-the familiar assemblies of roughly 150 DNA base pairs and eight histone proteins-found on chromatin fibers. We have developed a mesoscale model of short nucleosomal arrays and a computational framework that make it possible to incorporate detailed structural features of DNA and histones in simulations of short chromatin constructs with 3-25 evenly spaced nucleosomes. The correspondence between the predicted and observed effects of nucleosome composition, spacing, and numbers on long-range communication between regulatory proteins bound to the ends of designed nucleosome arrays lends credence to the model and to the molecular insights gleaned from the simulated structures. We have extracted effective nucleosome-nucleosome potentials from the mesoscale simulations and introduced the potentials in a larger scale computational treatment of regularly repeating chromatin fibers. Our results reveal a remarkable influence of nucleosome spacing on chromatin flexibility. Small changes in the length of the DNA fragments linking successive nucleosomes introduce marked changes in the local interactions of the nucleosomes and in the spatial configurations of the fiber as a whole. The changes in nucleosome positioning influence the statistical properties of longer chromatin constructs with 100-10,000 nucleosomes. We are investigating the extent to which the `local' interactions of regularly spaced nucleosomes contribute to the corresponding interactions in chains with mixed spacings as a step toward the treatment of fibers with nucleosomes positioned at the sites mapped at base-pair resolution on genomic sequences. Support of the work by USPHS R01 GM 34809 is gratefully acknowledged.
Rodríguez de la Vega, Ricardo C; Brandenburg, Jean-Tristan; Carpentier, Fantin; Giraud, Tatiana
2018-01-01
Abstract Gene presence–absence polymorphisms segregating within species are a significant source of genetic variation but have been little investigated to date in natural populations. In plant pathogens, the gain or loss of genes encoding proteins interacting directly with the host, such as secreted proteins, probably plays an important role in coevolution and local adaptation. We investigated gene presence–absence polymorphism in populations of two closely related species of castrating anther-smut fungi, Microbotryum lychnidis-dioicae (MvSl) and M. silenes-dioicae (MvSd), from across Europe, on the basis of Illumina genome sequencing data and high-quality genome references. We observed presence–absence polymorphism for 186 autosomal genes (2% of all genes) in MvSl, and only 51 autosomal genes in MvSd. Distinct genes displayed presence–absence polymorphism in the two species. Genes displaying presence–absence polymorphism were frequently located in subtelomeric and centromeric regions and close to repetitive elements, and comparison with outgroups indicated that most were present in a single species, being recently acquired through duplications in multiple-gene families. Gene presence–absence polymorphism in MvSl showed a phylogeographic structure corresponding to clusters detected based on SNPs. In addition, gene absence alleles were rare within species and skewed toward low-frequency variants. These findings are consistent with a deleterious or neutral effect for most gene presence–absence polymorphism. Some of the observed gene loss and gain events may however be adaptive, as suggested by the putative functions of the corresponding encoded proteins (e.g., secreted proteins) or their localization within previously identified selective sweeps. The adaptive roles in plant and anther-smut fungi interactions of candidate genes however need to be experimentally tested in future studies. PMID:29722826
Oldham, William M.; Van Eps, Ned; Preininger, Anita M.; Hubbell, Wayne L.; Hamm, Heidi E.
2007-01-01
Heterotrimeric G proteins function as molecular relays that mediate signal transduction from heptahelical receptors in the cell membrane to intracellular effector proteins. Crystallographic studies have demonstrated that guanine nucleotide exchange on the Gα subunit causes specific conformational changes in three key “switch” regions of the protein, which regulate binding to Gβγ subunits, receptors, and effector proteins. In the present study, nitroxide side chains were introduced at sites within the switch I region of Gαi to explore the structure and dynamics of this region throughout the G protein cycle. EPR spectra obtained for each of the Gα(GDP), Gα(GDP)βγ heterotrimer and Gα(GTPγS) conformations are consistent with the local environment observed in the corresponding crystal structures. Binding of the heterotrimer to activated rhodopsin to form the nucleotide-free (empty) complex, for which there is no crystal structure, causes prominent changes relative to the heterotrimer in the structure of switch I and contiguous sequences. The data identify a putative pathway of allosteric changes triggered by receptor binding and, together with previously published data, suggest elements of a mechanism for receptor-catalyzed nucleotide exchange. PMID:17463080
Markov and semi-Markov switching linear mixed models used to identify forest tree growth components.
Chaubert-Pereira, Florence; Guédon, Yann; Lavergne, Christian; Trottier, Catherine
2010-09-01
Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time-varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi-Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi-Markovian manner. The underlying semi-Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi-Markov chain represent-in the corresponding growth phase-both the influence of time-varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi-Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation-maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates. © 2009, The International Biometric Society.
Structure optimisation by thermal cycling for the hydrophobic-polar lattice model of protein folding
NASA Astrophysics Data System (ADS)
Günther, Florian; Möbius, Arnulf; Schreiber, Michael
2017-03-01
The function of a protein depends strongly on its spatial structure. Therefore the transition from an unfolded stage to the functional fold is one of the most important problems in computational molecular biology. Since the corresponding free energy landscapes exhibit huge numbers of local minima, the search for the lowest-energy configurations is very demanding. Because of that, efficient heuristic algorithms are of high value. In the present work, we investigate whether and how the thermal cycling (TC) approach can be applied to the hydrophobic-polar (HP) lattice model of protein folding. Evaluating the efficiency of TC for a set of two- and three-dimensional examples, we compare the performance of this strategy with that of multi-start local search (MSLS) procedures and that of simulated annealing (SA). For this aim, we incorporated several simple but rather efficient modifications into the standard procedures: in particular, a strong improvement was achieved by also allowing energy conserving state modifications. Furthermore, the consideration of ensembles instead of single samples was found to greatly improve the efficiency of TC. In the framework of different benchmarks, for all considered HP sequences, we found TC to be far superior to SA, and to be faster than Wang-Landau sampling.
Anin, M F; Leng, M
1990-01-01
Conformational changes induced in double-stranded oligonucleotides by the binding of trans- or cis-diamminedichloro platinum(II) to the d(GTG) sequence have been characterized by means of melting temperatures, electrophoretic migrations in non-denaturing polyacrylamide gels, reactivities with the artificial nuclease Phenanthroline-copper and with chemical probes. The cis-platinum adduct behaves more as a centre of directed bend than as a hinge joint, the induced bend angle being of the order of 25-30 degrees. The double helix is locally denatured over 2 base pairs (corresponding to the platinated 5'G residue and the central T residue) and is distorted over 4-5 base pairs. The trans-platinum adduct behaves also more as a centre of directed bend than as a hinge joint, the induced bend angle being of the order of 60 degrees. The double helix is locally denatured over 4 base pairs (corresponding to the immediately 5'T residue adjacent to the adduct and to the three base residues of the adduct). Both the cis- and trans-platinum adducts decrease the thermal stability of the double helix. Images PMID:2388824
Roux-Rouquie, Magali; Marilley, Monique
2000-01-01
We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860
Kuwabara, Tomoko; Warashina, Masaki; Koseki, Shiori; Sano, Masayuki; Ohkawa, Jun; Nakayama, Kazuhisa; Taira, Kazunari
2001-01-01
Hammerhead ribozymes were expressed under the control of similar tRNA promoters, localizing transcripts either in the cytoplasm or the nucleus. The tRNAVal-driven ribozyme (tRNA-Rz; tRNA with extra sequences at the 3′ end) that has been used in our ribozyme studies was exported efficiently into the cytoplasm and ribozyme activity was detected only in the cytoplasmic fraction. Both ends of the transported tRNA-Rz were characterized comprehensively and the results confirmed that tRNA-Rz had unprocessed 5′ and 3′ ends. Furthermore, it was also demonstrated that the activity of the exported ribozyme was significantly higher than that of the ribozyme which remained in the nucleus. We suggest that it is possible to engineer tRNA-Rz, which can be exported to the cytoplasm based on an understanding of secondary structures, and then tRNA-driven ribozymes may be co-localized with their target mRNAs in the cytoplasm of mammalian cells. PMID:11433023
Conlon, J M; Davis, M S; Falkmer, S; Thim, L
1987-11-02
The primary structures of three peptides from extracts from the pancreatic islets of the daddy sculpin (Cottus scorpius) and three analogous peptides from the islets of the flounder (Platichthys flesus), two species of teleostean fish, have been determined by automated Edman degradation. The structures of the flounder peptides were confirmed by fast-atom bombardment mass spectrometry. The peptides show strong homology to residues (49-60), (63-96) and (98-125) of the predicted sequence of preprosomatostatin II from the anglerfish (Lophius americanus). The amino acid sequences of the peptides suggest that, in the sculpin, prosomatostatin II is cleaved at a dibasic amino acid residue processing site (corresponding to Lys61-Arg62 in anglerfish preprosomatostatin II). The resulting fragments are further cleaved at monobasic residue processing sites (corresponding to Arg48 and Arg97 in anglerfish preprosomatostatin II). In the flounder the same dibasic residue processing site is utilised but cleavage at different monobasic sites takes place (corresponding to Arg50 and Arg97 in anglerfish preprosomatostatin II). A peptide identical to mammalian somatostatin-14 was also isolated from the islets of both species and is presumed to represent a cleavage product of prosomatostatin I.
Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan
2009-01-01
We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Ikemoto, Tadahiro; Park, Min Kyun
2003-10-16
To elucidate the molecular phylogeny and evolution of a particular peptide, one must analyze not the limited primary amino acid sequences of the low molecular weight mature polypeptide, but rather the sequences of the corresponding precursors from various species. Of all the structural variants of gonadotropin-releasing hormone (GnRH), GnRH-II (chicken GnRH-II, or cGnRH-II) is remarkably conserved without any sequence substitutions among vertebrates, but its precursor sequences vary considerably. We have identified and characterized the full-length complementary DNA (cDNA) encoding the GnRH-II precursor and determined its genomic structure, consisting of four exons and three introns, in a reptilian species, the leopard gecko Eublepharis macularius. This is the first report about the GnRH-II precursor cDNA/gene from reptiles. The deduced leopard gecko prepro-GnRH-II polypeptide had the highest identities with the corresponding polypeptides of amphibians. The GnRH-II precursor mRNA was detected in more than half of the tissues and organs examined. This widespread expression is consistent with the previous findings in several species, though the roles of GnRH outside the hypothalamus-pituitary-gonadal axis remain largely unknown. Molecular phylogenetic analysis combined with sequence comparison showed that the leopard gecko is more similar to fishes and amphibians than to eutherian mammals with respect to the GnRH-II precursor sequence. These results strongly suggest that the divergence of the GnRH-II precursor sequences seen in eutherian mammals may have occurred along with amniote evolution.
E-MSD: an integrated data resource for bioinformatics.
Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K
2005-01-01
The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.
Klaus, James S; Janse, Ingmar; Heikoop, Jeffrey M; Sanford, Robert A; Fouke, Bruce W
2007-05-01
The high incidence of coral disease in shallow coastal marine environments suggests seawater depth and coastal pollution have an impact on the microbial communities inhabiting healthy coral tissues. A study was undertaken to determine how bacterial communities inhabiting tissues of the coral Montastraea annularis change at 5 m, 10 m and 20 m water depth in varying proximity to the urban centre and seaport of Willemstad, Curaçao, Netherlands Antilles. Analyses of terminal restriction fragment length polymorphisms (TRFLP) of 16S rRNA gene sequences show significant differences in bacterial communities of polluted and control localities only at the shallowest seawater depth. Furthermore, distinct differences in bacterial communities were found with increasing water depth. Comparisons of TRFLP peaks with sequenced clone libraries indicate the black band disease cyanobacterium clone CD1C11 is common and most abundant on healthy corals in less than 10 m water depth. Similarly, sequences belonging to a previously unrecognized group of likely phototrophic bacteria, herein referred to as CAB-I, were also more common in shallow water. To assess the influence of environmental and physiologic factors on bacterial community structure, canonical correspondence analysis was performed using explanatory variables associated with: (i) light availability; (ii) seawater pollution; (iii) coral mucus composition; (iv) the community structure of symbiotic algae; and (v) the photosynthetic activity of symbiotic algae. Eleven per cent of the variation in bacterial communities was accounted for by covariation with these variables; the most important being photosynthetically active radiation (sunlight) and the coral uptake of sewage-derived compounds as recorded by the delta(15)N of coral tissue.
A generative, probabilistic model of local protein structure.
Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas
2008-07-01
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
Mapping of aldose reductase gene sequences to human chromosomes 1, 3, 7, 9, 11, and 13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bateman, J.B.; Kojis, T.; Heinzmann, C.
1993-09-01
Aldose reductase (alditol:NAD(P)+ 1-oxidoreductase; EC 1.1.1.21) (AR) catalyzes the reduction of several aldehydes, including that of glucose, to the corresponding sugar alcohol. Using a complementary DNA clone encoding human AR, the authors mapped the gene sequences to human chromosomes 1, 3, 7, 9, 11, 13, 14, and 18 by somatic cell hybridization. By in situ hybridization analysis, sequences were localized to human chromosomes 1q32-q43, 3p12, 7q31-q35, 9q22, 11p14-p15, and 13q14-q21. As a putative functional AR gene has been mapped to chromosome 7 and a putative pseudogene to chromosome 3, the sequences on the other seven chromosomes may represent other activemore » genes, non-aldose reductase homologous sequences, or pseudogenes. 24 refs., 3 figs., 2 tabs.« less
NASA Technical Reports Server (NTRS)
Cai, X.; Henry, R. L.; Takemoto, L. J.; Guikema, J. A.; Wong, P. P.; Spooner, B. S. (Principal Investigator)
1992-01-01
The amino acid sequences of the beta and gamma subunit polypeptides of glutamine synthetase from bean (Phaseolus vulgaris L.) root nodules are very similar. However, there are small regions within the sequences that are significantly different between the two polypeptides. The sequences between amino acids 2 and 9 and between 264 and 274 are examples. Three peptides (gamma 2-9, gamma 264-274, and beta 264-274) corresponding to these sequences were synthesized. Antibodies against these peptides were raised in rabbits and purified with corresponding peptide-Sepharose affinity chromatography. Western blot analysis of polyacrylamide gel electrophoresis of bean nodule proteins demonstrated that the anti-beta 264-274 antibodies reacted specifically with the beta polypeptide and the anti-gamma 264-274 and anti-gamma 2-9 antibodies reacted specifically with the gamma polypeptide of the native and denatured glutamine synthetase. These results showed the feasibility of using synthetic peptides in developing antibodies that are capable of distinguishing proteins with similar primary structures.
Elimination of motion and pulsation artifacts using BLADE sequences in knee MR imaging.
Lavdas, Eleftherios; Mavroidis, Panayiotis; Hatzigeorgiou, Vasiliki; Roka, Violeta; Arikidis, Nikos; Oikonomou, Georgia; Andrianopoulos, Konstantinos; Notaras, Ioannis
2012-10-01
The purpose of this study is to evaluate the ability of proton density (PD)-BLADE sequences in reducing or even eliminating motion and pulsatile flow artifacts in knee magnetic resonance imaging examinations. Eighty consecutive patients, who had been routinely scanned for knee examination, participated in the study. The following pairs of sequences with and without BLADE were compared: (a) PD turbo spin echo (TSE) sagittal (SAG) fat saturation (FS) in 35 patients, (b) PD TSE coronal (COR) FS in 19 patients, (c) T2 TSE axial in 13 patients and (d) PD TSE SAG in 13 patients. Both qualitative and quantitative analyses were performed based on the signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and relative contrast (ReCon) measures of normal anatomic structures. The qualitative analysis was performed by experienced radiologists. Also, the presence of image motion and pulsation artifacts was evaluated. Based on the results of the SNR, CRN and ReCon for the different sequences and anatomical structures, the BLADE sequences were significantly superior in 19 cases, whereas the corresponding conventional sequences were significantly superior in only 6 cases. BLADE sequences eliminated motion artifacts in all the cases. However, motion artifacts were shown in (a) six PD TSE SAG FS, (b) three PD TSE COR FS, (c) three PD TSE SAG and (d) two T2 TSE axial conventional sequences. In our results, it was found that, in PD FS sequences (sagittal and coronal), the differences between the BLADE and conventional sequences regarding the elimination of motion and pulsatile flow artifacts were statistically significant. In all the comparisons, the PD FS BLADE sequences (coronal and sagittal) were significantly superior to the corresponding conventional sequences regarding the classification of their image quality. In conclusion, this technique appears to be capable to potentially eliminate motion and pulsatile flow artifacts in MR images. Copyright © 2012 Elsevier Inc. All rights reserved.
Pesteie, Mehran; Abolmaesumi, Purang; Ashab, Hussam Al-Deen; Lessoway, Victoria A; Massey, Simon; Gunka, Vit; Rohling, Robert N
2015-06-01
Injection therapy is a commonly used solution for back pain management. This procedure typically involves percutaneous insertion of a needle between or around the vertebrae, to deliver anesthetics near nerve bundles. Most frequently, spinal injections are performed either blindly using palpation or under the guidance of fluoroscopy or computed tomography. Recently, due to the drawbacks of the ionizing radiation of such imaging modalities, there has been a growing interest in using ultrasound imaging as an alternative. However, the complex spinal anatomy with different wave-like structures, affected by speckle noise, makes the accurate identification of the appropriate injection plane difficult. The aim of this study was to propose an automated system that can identify the optimal plane for epidural steroid injections and facet joint injections. A multi-scale and multi-directional feature extraction system to provide automated identification of the appropriate plane is proposed. Local Hadamard coefficients are obtained using the sequency-ordered Hadamard transform at multiple scales. Directional features are extracted from local coefficients which correspond to different regions in the ultrasound images. An artificial neural network is trained based on the local directional Hadamard features for classification. The proposed method yields distinctive features for classification which successfully classified 1032 images out of 1090 for epidural steroid injection and 990 images out of 1052 for facet joint injection. In order to validate the proposed method, a leave-one-out cross-validation was performed. The average classification accuracy for leave-one-out validation was 94 % for epidural and 90 % for facet joint targets. Also, the feature extraction time for the proposed method was 20 ms for a native 2D ultrasound image. A real-time machine learning system based on the local directional Hadamard features extracted by the sequency-ordered Hadamard transform for detecting the laminae and facet joints in ultrasound images has been proposed. The system has the potential to assist the anesthesiologists in quickly finding the target plane for epidural steroid injections and facet joint injections.
Vences, Miguel; Rasoloariniaina, Jean R; Riemann, Jana C
2018-02-08
The genus Typhleotris contains three poorly known blind fish species, inhabiting aquifers in the limestone plateau of south-western Madagascar. Until recently these species were known from only few localities, and their pattern of genetic differentiation remains poorly studied. In this study we analyse 122 Typhleotris tissue samples collected from 12 localities, spanning the entire known range of the genus, and use DNA sequences to assign these samples to the three species known. The phylogeny based on the mitochondrial marker cox1 revealed three main clades corresponding to the three species: Typhleotris madagascariensis, T. mararybe and T. pauliani, differing by uncorrected pairwise sequence divergences of 6.3-9.8%. The distribution ranges of the three species overlapped widely: T. mararybe was collected only in a southern group of localities, T. madagascariensis was found in both the southern and the central group of localities, and T. pauliani occurred from the northernmost site to the southern group of localities; yet the three species did not share haplotypes in two nuclear genes, except for three individuals that we hypothesize are hybrids of T. pauliani with T. madagascariensis and T. mararybe. This pattern of concordant mitochondrial and nuclear divergence despite sympatry strongly supports the status of all three taxa as separate species. Phylogeographic structure was obvious in T. madagascariensis, with two separate shallow mitochondrial clades occupying (1) the central vs. (2) the southern group of populations, and in T. pauliani, with separate mitochondrial clades for (1) the northern vs. (2) the central/southern populations. The widespread occurrence of these three cave fish species suggests that the aquifers in south-western Madagascar have at least in the past allowed episodic dispersal and gene flow of subterraneous organisms, whereas the phylogeographic pattern of T. madagascariensis and T. pauliani provides evidence for isolation and loss of connectivity in the more recent past.
Tsoumani, Konstantina T.; Drosopoulou, Elena; Bourtzis, Kostas; Gariou-Papalexiou, Aggeliki; Mavragani-Tsipidou, Penelope; Zacharopoulou, Antigone; Mathiopoulos, Kostas D.
2015-01-01
Sex chromosomes have many unusual features relative to autosomes. The in depth exploration of their structure will improve our understanding of their origin and divergence (degeneration) as well as the evolution of genetic sex determination pathways which, most often are attributed to them. In Tephritids, the structure of Y chromosome, where the male-determining factor M is localized, is largely unexplored and limited data concerning its sequence content and evolution are available. In order to get insight into the structure and organization of the Y chromosome of the major olive insect pest, the olive fly Bactrocera oleae, we characterized sequences from a Pulse Field Gel Electrophoresis (PFGE)-isolated Y chromosome. Here, we report the discovery of the first olive fly LTR retrotransposon with increased presence on the Y chromosome. The element belongs to the BEL-Pao superfamily, however, its sequence comparison with the other members of the superfamily suggests that it constitutes a new family that we termed Achilles. Its ~7.5 kb sequence consists of the 5’LTR, the 5’non-coding sequence and the open reading frame (ORF), which encodes the polyprotein Gag-Pol. In situ hybridization to the B. oleae polytene chromosomes showed that Achilles is distributed in discrete bands dispersed on all five autosomes, in all centromeric regions and in the granular heterochromatic network corresponding to the mitotic sex chromosomes. The between sexes comparison revealed a variation in Achilles copy number, with male flies possessing 5–10 copies more than female (CI range: 18–38 and 12–33 copies respectively per genome). The examination of its transcriptional activity demonstrated the presence of at least one intact active copy in the genome, showing a differential level of expression between sexes as well as during embryonic development. The higher expression was detected in male germline tissues (testes). Moreover, the presence of Achilles-like elements in different species of the Tephritidae family suggests an ancient origin of this element. PMID:26398504
MRI Guided Brain Stimulation without the Use of a Neuronavigation System
Vaghefi, Ehsan; Byblow, Winston D.; Stinear, Cathy M.; Thompson, Benjamin
2015-01-01
A key issue in the field of noninvasive brain stimulation (NIBS) is the accurate localization of scalp positions that correspond to targeted cortical areas. The current gold standard is to combine structural and functional brain imaging with a commercially available “neuronavigation” system. However, neuronavigation systems are not commonplace outside of specialized research environments. Here we describe a technique that allows for the use of participant-specific functional and structural MRI data to guide NIBS without a neuronavigation system. Surface mesh representations of the head were generated using Brain Voyager and vectors linking key anatomical landmarks were drawn on the mesh. Our technique was then used to calculate the precise distances on the scalp corresponding to these vectors. These calculations were verified using actual measurements of the head and the technique was used to identify a scalp position corresponding to a brain area localized using functional MRI. PMID:26413537
Relationships between residue Voronoi volume and sequence conservation in proteins.
Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung
2018-02-01
Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.
Common 5S rRNA variants are likely to be accepted in many sequence contexts
NASA Technical Reports Server (NTRS)
Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.
2003-01-01
Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The results demonstrate that changes that occur multiple times in a local region of RNA sequence space in fact usually will be accepted in any sequence context in that same local region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gauthier, F.J.; Boudjema, A.; Lounis, R.
1995-08-01
The Ghadames and Illizi basins cover the majority of the eastern Sahara of Algeria. Geologicaly, this part of the Central Saharan platform has been influenced by a series of structural arches and {open_quotes}moles{close_quotes} (continental highs) which controlled sedimentation and structure through geologic time. These features, resulting from and having been affected by nine major tectonic phases ranging from pre-Cambrian to Tertiary, completely bound the Ghadames and Illizi Basins. During the Paleozoic both basins formed one continuous depositional entity with the Ghadames basin being the distal portion of the continental sag basin where facies and thickness variations are observed over largemore » distances. It is during the Mesozoic-Cenozoic that the Ghadames basin starts to evolve differently from the Illizi Basin. Eustatic low-stand periods resulted in continental deposition yielding the major petroleum-bearing reservoir horizons (Cambrian, Ordovician, Siluro-Devonian and Carboniferous). High-stand periods corresponds to the major marine transgressions covering the majority of the Saharan platform. These transgressions deposited the principal source rock intervals of the Silurian and Middle to Upper Devonian. The main reservoirs of the Mesozoic and Cenozoic are Triassic sandstone sequences which are covered by a thick evaporite succession forming a super-seal. Structurally, the principal phases affecting this sequence are the extensional events related to the breakup of Pangea and the Alpine compressional events. The Ghadames and Illizi basins, therefore, have been controlled by a polphase tectonic history influenced by Pan African brittle basement fracturing which resulted in complex structures localized along the major basin bounding trends as well as several subsidiary trends within the basin. These trends, as demonstrated with key seismic data, have been found to contain the majority of hydrocarbons trapped.« less
Guo, Kang-kang; Tang, Qing-hai; Zhang, Yan-ming; Kang, Kai; He, Lei
2011-05-18
The membrane topology and molecular mechanisms for endoplasmic reticulum (ER) localization of classical swine fever virus (CSFV) non-structural 2 (NS2) protien is unclear. We attempted to elucidate the subcellular localization, and the molecular mechanisms responsible for the localization of this protein in our study. The NS2 gene was amplified by reverse transcription polymerase chain reaction, with the transmembrane region and hydrophilicity of the NS2 protein was predicted by bioinformatics analysis. Twelve cDNAs of the NS2 gene were amplified by the PCR deletion method and cloned into a eukaryotic expression vector, which was transfected into a swine umbilical vein endothelial cell line (SUVEC). Subcellular localization of the NS2 protein was characterized by confocal microscopy, and western blots were carried out to analyze protein expression. Our results showed that the -NH2 terminal of the CSFV NS2 protein was highly hydrophobic and the protein localized in the ER. At least four transmembrane regions and two internal signal peptide sequences (amino acids103-138 and 220-262) were identified and thought to be critical for its trans-localization to the ER. This is the first study to identify the internal signal peptide sequences of the CSFV NS2 protein and its subcellular localization, providing the foundation for further exploration of this protein's function of this protein and its role in CSFV pathogenesis.
Deresiewicz, R L; Flaxenburg, J; Leng, K; Kasper, D L
1996-01-01
To explore whether a novel staphylococcal clone or structural variant of toxic shock syndrome toxin 1 is associated with Kawasaki syndrome, six toxigenic strains of Staphylococcus aureus from Kawasaki syndrome patients were studied. The strains were divisible into two groups based on phenotypic and genotypic characteristics and are therefore unequivocally not clonal. Portions of the tstH genes of each strain were sequenced. Three were sequenced in their entirety, while the remainder were sequenced from codon 66 to codon 137 of the mature protein only. Two of the former group differed slightly in the sequences of their signal peptides relative to the sequence published for the tstH signal peptide. Those differences did not affect toxin processing or secretion. The sequenced portions of the regions encoding mature toxic shock syndrome toxin 1 were identical in all six strains and corresponded exactly to the published sequence of tstH. No evidence was found for the existence of a structural variant of tstH uniquely associated with Kawasaki syndrome. PMID:8757881
Zardus, John D; Etter, Ron J; Chase, Michael R; Rex, Michael A; Boyle, Elizabeth E
2006-03-01
The deep-sea soft-sediment environment hosts a diverse and highly endemic fauna of uncertain origin. We know little about how this fauna evolved because geographic patterns of genetic variation, the essential information for inferring patterns of population differentiation and speciation are poorly understood. Using formalin-fixed specimens from archival collections, we quantify patterns of genetic variation in the protobranch bivalve Deminucula atacellana, a species widespread throughout the Atlantic Ocean at bathyal and abyssal depths. Samples were taken from 18 localities in the North American, West European and Argentine basins. A hypervariable region of mitochondrial 16S rDNA was amplified by polymerase chain reaction (PCR) and sequenced from 130 individuals revealing 21 haplotypes. Except for several important exceptions, haplotypes are unique to each basin. Overall gene diversity is high (h = 0.73) with pronounced population structure (Phi(ST) = 0.877) and highly significant geographic associations (P < 0.0001). Sequences cluster into four major clades corresponding to differences in geography and depth. Genetic divergence was much greater among populations at different depths within the same basin, than among those at similar depths but separated by thousands of kilometres. Isolation by distance probably explains much of the interbasin variation. Depth-related divergence may reflect historical patterns of colonization or strong environmental selective gradients. Broadly distributed deep-sea organisms can possess highly genetically divergent populations, despite the lack of any morphological divergence.
Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude
2013-01-01
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
Modeling Of Object- And Scene-Prototypes With Hierarchically Structured Classes
NASA Astrophysics Data System (ADS)
Ren, Z.; Jensch, P.; Ameling, W.
1989-03-01
The success of knowledge-based image analysis methodology and implementation tools depends largely on an appropriately and efficiently built model wherein the domain-specific context information about and the inherent structure of the observed image scene have been encoded. For identifying an object in an application environment a computer vision system needs to know firstly the description of the object to be found in an image or in an image sequence, secondly the corresponding relationships between object descriptions within the image sequence. This paper presents models of image objects scenes by means of hierarchically structured classes. Using the topovisual formalism of graph and higraph, we are currently studying principally the relational aspect and data abstraction of the modeling in order to visualize the structural nature resident in image objects and scenes, and to formalize. their descriptions. The goal is to expose the structure of image scene and the correspondence of image objects in the low level image interpretation. process. The object-based system design approach has been applied to build the model base. We utilize the object-oriented programming language C + + for designing, testing and implementing the abstracted entity classes and the operation structures which have been modeled topovisually. The reference images used for modeling prototypes of objects and scenes are from industrial environments as'well as medical applications.
Epitope mapping of the domains of human angiotensin converting enzyme.
Kugaevskaya, Elena V; Kolesanova, Ekaterina F; Kozin, Sergey A; Veselovsky, Alexander V; Dedinsky, Ilya R; Elisseeva, Yulia E
2006-06-01
Somatic angiotensin converting enzyme (sACE), contains in its single chain two homologous domains (called N- and C-domains), each bearing a functional zinc-dependent active site. The present study aims to define the differences between two sACE domains and to localize experimentally revealed antigenic determinants (B-epitopes) in the recently determined three-dimensional structure of testicular tACE. The predicted linear antigenic determinants of human sACE were determined by peptide scanning ("PEPSCAN") approach. Essential difference was demonstrated between locations of the epitopes in the N- and C-domains. Comparison of arrangement of epitopes in the human domains with the corresponding sequences of some mammalian sACEs enabled to classify the revealed antigenic determinants as variable or conserved areas. The location of antigenic determinants with respect to various structural elements and to functionally important sites of the human sACE C-domain was estimated. The majority of antigenic sites of the C-domain were located at the irregular elements and at the boundaries of secondary structure elements. The data show structural differences between the sACE domains. The experimentally revealed antigenic determinants were in agreement with the recently determined crystal tACE structure. New potential applications are open to successfully produce mono-specific and group-specific antipeptide antibodies.
NASA Astrophysics Data System (ADS)
Kim, Duckhoe; Sahin, Ozgur
2015-03-01
Scanning probe microscopes can be used to image and chemically characterize surfaces down to the atomic scale. However, the localized tip-sample interactions in scanning probe microscopes limit high-resolution images to the topmost atomic layer of surfaces, and characterizing the inner structures of materials and biomolecules is a challenge for such instruments. Here, we show that an atomic force microscope can be used to image and three-dimensionally reconstruct chemical groups inside a protein complex. We use short single-stranded DNAs as imaging labels that are linked to target regions inside a protein complex, and T-shaped atomic force microscope cantilevers functionalized with complementary probe DNAs allow the labels to be located with sequence specificity and subnanometre resolution. After measuring pairwise distances between labels, we reconstruct the three-dimensional structure formed by the target chemical groups within the protein complex using simple geometric calculations. Experiments with the biotin-streptavidin complex show that the predicted three-dimensional loci of the carboxylic acid groups of biotins are within 2 Å of their respective loci in the corresponding crystal structure, suggesting that scanning probe microscopes could complement existing structural biological techniques in solving structures that are difficult to study due to their size and complexity.
Direct Calculation of Protein Fitness Landscapes through Computational Protein Design
Au, Loretta; Green, David F.
2016-01-01
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A∗ search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones. PMID:26745411
NASA Astrophysics Data System (ADS)
Yang, C. H.; Shen, G. Z.; Ao, Z. M.; Xu, Y. W.
2016-09-01
Using the transfer matrix method, the carrier tunneling properties in graphene superlattice generated by the Thue-Morse sequence and Kolakoski sequence are investigated. The positions and strength of the transmission can be modulated by the barrier structures, the incident energy and angle, the height and width of the potential. These carriers tunneling characteristic can be understood from the energy band structures in the corresponding superlattice systems and the carrier’s states in well/barriers. The transmission peaks above the critical incident angle rely on the carrier’s resonance in the well regions. The structural diversity can modulate the electronic and transport properties, thus expanding its applications.
NASA Astrophysics Data System (ADS)
Sitaula, R. P.; Aschoff, J.
2013-12-01
Regional-scale sequence stratigraphic correlation, well log analysis, syntectonic unconformity mapping, isopach maps, and depositional environment maps of the upper Mesaverde Group (UMG) in Uinta basin, Utah suggest higher accommodation in northeastern part (Natural Buttes area) and local development of lacustrine facies due to increased subsidence caused by uplift of San Rafael Swell (SRS) in southern and Uinta Uplift in northern parts. Recently discovered lacustrine facies in Natural Buttes area are completely different than the dominant fluvial facies in outcrops along Book Cliffs and could have implications for significant amount of tight-gas sand production from this area. Data used for sequence stratigraphic correlation, isopach maps and depositional environmental maps include > 100 well logs, 20 stratigraphic profiles, 35 sandstone thin sections and 10 outcrop-based gamma ray profiles. Seven 4th order depositional sequences (~0.5 my duration) are identified and correlated within UMG. Correlation was constructed using a combination of fluvial facies and stacking patterns in outcrops, chert-pebble conglomerates and tidally influenced strata. These surfaces were extrapolated into subsurface by matching GR profiles. GR well logs and core log of Natural Buttes area show intervals of coarsening upward patterns suggesting possible lacustrine intervals that might contain high TOC. Locally, younger sequences are completely truncated across SRS whereas older sequences are truncated and thinned toward SRS. The cycles of truncation and thinning represent phases of SRS uplift. Thinning possibly related with the Uinta Uplift is also observed in northwestern part. Paleocurrents are consistent with interpretation of periodic segmentation and deflection of sedimentation. Regional paleocurrents are generally E-NE-directed in Sequences 1-4, and N-directed in Sequences 5-7. From isopach maps and paleocurrent direction it can be interpreted that uplift of SRS changed route of sediment supply from west to southwest. Locally, paleocurrents are highly variable near SRS further suggesting UMG basin-fill was partitioned by uplift of SRS. Sandstone composition analysis also suggests the uplift of SRS causing the variation of source rocks in upper sequences than the lower sequences. In conclusion, we suggest that Uinta basin was episodically partitioned during the deposition of UMG due to uplift of Laramide structures in the basin and accommodation was localized in northeastern part. Understanding of structural controls on accommodation, sedimentation patterns and depositional environments will aid prediction of the best-producing gas reservoirs.
Absence of auditory 'global interference' in autism.
Foxton, Jessica M; Stewart, Mary E; Barnard, Louise; Rodgers, Jacqui; Young, Allan H; O'Brien, Gregory; Griffiths, Timothy D
2003-12-01
There has been considerable recent interest in the cognitive style of individuals with Autism Spectrum Disorder (ASD). One theory, that of weak central coherence, concerns an inability to combine stimulus details into a coherent whole. Here we test this theory in the case of sound patterns, using a new definition of the details (local structure) and the coherent whole (global structure). Thirteen individuals with a diagnosis of autism or Asperger's syndrome and 15 control participants were administered auditory tests, where they were required to match local pitch direction changes between two auditory sequences. When the other local features of the sequence pairs were altered (the actual pitches and relative time points of pitch direction change), the control participants obtained lower scores compared with when these details were left unchanged. This can be attributed to interference from the global structure, defined as the combination of the local auditory details. In contrast, the participants with ASD did not obtain lower scores in the presence of such mismatches. This was attributed to the absence of interference from an auditory coherent whole. The results are consistent with the presence of abnormal interactions between local and global auditory perception in ASD.
Prediction of β-turns in proteins from multiple alignment using neural network
Kaur, Harpreet; Raghava, Gajendra Pal Singh
2003-01-01
A neural network-based method has been developed for the prediction of β-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST–generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Qpred, Qobs, and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published β-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach. PMID:12592033
NASA Astrophysics Data System (ADS)
Horn, B. L. D.; Melo, T. M.; Schultz, C. L.; Philipp, R. P.; Kloss, H. P.; Goldberg, K.
2014-11-01
The Santacruzodon assemblage zone was originally defined as a vertebrate fossil assemblage composed basically of non-mammalian cynodonts found in Santa Cruz do Sul and Venâncio Aires municipalities in Southern Brazil. This assemblage zone was positioned at the top of the Sequence I, in the Triassic Santa Maria Supersequence, Paraná Basin. However, the Santacruzodon assemblage zone does not occur across the entire area of the Santa Maria Supersequence. Based on new paleontological, structural and sedimentological data, we propose the existence of a new third-order sequence (Santa Cruz Sequence) between Sequences I and II in the Santa Maria Supersequence. Satellite image analysis was used to identify regional, NW- and NE-oriented lineaments that limit the occurrence zone. Outcrop data allowed the identification of a regional, angular unconformity that bounds the new sequence. The faunal content allowed the correlation of the new Santa Cruz Sequence with Madagascar's Isalo II fauna, corresponding to the Ladinian (Middle Triassic). New names were suggested for the sequences in the Santa Maria Supersequence, since the Santa Cruz Sequence was deposited between the former Sequences I and II. This unit was deposited or preserved exclusively on the hanging wall of normal faults, being absent from the adjacent structural blocks.
Key Role of CRF in the Skin Stress Response System
Zmijewski, Michal A.; Zbytek, Blazej; Tobin, Desmond J.; Theoharides, Theoharis C.; Rivier, Jean
2013-01-01
The discovery of corticotropin-releasing factor (CRF) or CRH defining the upper regulatory arm of the hypothalamic-pituitary-adrenal (HPA) axis, along with the identification of the corresponding receptors (CRFRs 1 and 2), represents a milestone in our understanding of central mechanisms regulating body and local homeostasis. We focused on the CRF-led signaling systems in the skin and offer a model for regulation of peripheral homeostasis based on the interaction of CRF and the structurally related urocortins with corresponding receptors and the resulting direct or indirect phenotypic effects that include regulation of epidermal barrier function, skin immune, pigmentary, adnexal, and dermal functions necessary to maintain local and systemic homeostasis. The regulatory modes of action include the classical CRF-led cutaneous equivalent of the central HPA axis, the expression and function of CRF and related peptides, and the stimulation of pro-opiomelanocortin peptides or cytokines. The key regulatory role is assigned to the CRFR-1α receptor, with other isoforms having modulatory effects. CRF can be released from sensory nerves and immune cells in response to emotional and environmental stressors. The expression sequence of peptides includes urocortin/CRF→pro-opiomelanocortin→ACTH, MSH, and β-endorphin. Expression of these peptides and of CRFR-1α is environmentally regulated, and their dysfunction can lead to skin and systemic diseases. Environmentally stressed skin can activate both the central and local HPA axis through either sensory nerves or humoral factors to turn on homeostatic responses counteracting cutaneous and systemic environmental damage. CRF and CRFR-1 may constitute novel targets through the use of specific agonists or antagonists, especially for therapy of skin diseases that worsen with stress, such as atopic dermatitis and psoriasis. PMID:23939821
Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding.
Pechmann, Sebastian; Frydman, Judith
2013-02-01
The choice of codons can influence local translation kinetics during protein synthesis. Whether codon preference is linked to cotranslational regulation of polypeptide folding remains unclear. Here, we derive a revised translational efficiency scale that incorporates the competition between tRNA supply and demand. Applying this scale to ten closely related yeast species, we uncover the evolutionary conservation of codon optimality in eukaryotes. This analysis reveals universal patterns of conserved optimal and nonoptimal codons, often in clusters, which associate with the secondary structure of the translated polypeptides independent of the levels of expression. Our analysis suggests an evolved function for codon optimality in regulating the rhythm of elongation to facilitate cotranslational polypeptide folding, beyond its previously proposed role of adapting to the cost of expression. These findings establish how mRNA sequences are generally under selection to optimize the cotranslational folding of corresponding polypeptides.
Chu, Chien-Hsin; Chang, Lung-Chun; Hsu, Hong-Ming; Wei, Shu-Yi; Liu, Hsing-Wei; Lee, Yu; Kuo, Chung-Chi; Indra, Dharmu; Chen, Chinpan; Ong, Shiou-Jeng; Tai, Jung-Hsiang
2011-01-01
Nuclear proteins usually contain specific peptide sequences, referred to as nuclear localization signals (NLSs), for nuclear import. These signals remain unexplored in the protozoan pathogen, Trichomonas vaginalis. The nuclear import of a Myb2 transcription factor was studied here using immunodetection of a hemagglutinin-tagged Myb2 overexpressed in the parasite. The tagged Myb2 was localized to the nucleus as punctate signals. With mutations of its polybasic sequences, 48KKQK51 and 61KR62, Myb2 was localized to the nucleus, but the signal was diffusive. When fused to a C-terminal non-nuclear protein, the Myb2 sequence spanning amino acid (aa) residues 48 to 143, which is embedded within the R2R3 DNA-binding domain (aa 40 to 156), was essential and sufficient for efficient nuclear import of a bacterial tetracycline repressor (TetR), and yet the transport efficiency was reduced with an additional fusion of a firefly luciferase to TetR, while classical NLSs from the simian virus 40 T-antigen had no function in this assay system. Myb2 nuclear import and DNA-binding activity were substantially perturbed with mutation of a conserved isoleucine (I74) in helix 2 to proline that altered secondary structure and ternary folding of the R2R3 domain. Disruption of DNA-binding activity alone by point mutation of a lysine residue, K51, preceding the structural domain had little effect on Myb2 nuclear localization, suggesting that nuclear translocation of Myb2, which requires an ordered structural domain, is independent of its DNA binding activity. These findings provide useful information for testing whether myriad Mybs in the parasite use a common module to regulate nuclear import. PMID:22021237
Billoud, B; Kontic, M; Viari, A
1996-01-01
At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified. PMID:8628670
A state space based approach to localizing single molecules from multi-emitter images.
Vahid, Milad R; Chao, Jerry; Ward, E Sally; Ober, Raimund J
2017-01-28
Single molecule super-resolution microscopy is a powerful tool that enables imaging at sub-diffraction-limit resolution. In this technique, subsets of stochastically photoactivated fluorophores are imaged over a sequence of frames and accurately localized, and the estimated locations are used to construct a high-resolution image of the cellular structures labeled by the fluorophores. Available localization methods typically first determine the regions of the image that contain emitting fluorophores through a process referred to as detection. Then, the locations of the fluorophores are estimated accurately in an estimation step. We propose a novel localization method which combines the detection and estimation steps. The method models the given image as the frequency response of a multi-order system obtained with a balanced state space realization algorithm based on the singular value decomposition of a Hankel matrix, and determines the locations of intensity peaks in the image as the pole locations of the resulting system. The locations of the most significant peaks correspond to the locations of single molecules in the original image. Although the accuracy of the location estimates is reasonably good, we demonstrate that, by using the estimates as the initial conditions for a maximum likelihood estimator, refined estimates can be obtained that have a standard deviation close to the Cramér-Rao lower bound-based limit of accuracy. We validate our method using both simulated and experimental multi-emitter images.
Exhaustive comparison and classification of ligand-binding surfaces in proteins
Murakami, Yoichi; Kinoshita, Kengo; Kinjo, Akira R; Nakamura, Haruki
2013-01-01
Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes. PMID:23934772
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files
2014-01-01
Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Study of the Structural Stability in Intermetallics Using Displacive Transformation Paths
NASA Astrophysics Data System (ADS)
Sob, M.; Wang, L. G.; Vitek, V.
1997-03-01
Relative structural stability of TiAl, FeAl, NiAl and NiTi is studied by investigating displacive phase transformation paths. These include the well known tetragonal (Bain's) and trigonal deformation paths which correspond to large homogeneous straining, and also more complex paths that include the shuffling of atomic planes. The results of full-potential APW total energy calculations show that all higher-energy cubic structures studied are locally unstable with respect to some deformation modes. There may or may not be symmetry-dictated energy extrema corresponding to cubic lattices depending on the atomic ordering. However, other energy extrema that are not imposed by symmetry requirements occur along the transformation paths. Configurations corresponding to energy minima may represent metastable structures that can play an important role in interfaces and other extended defects.
Superstatistical model of bacterial DNA architecture
NASA Astrophysics Data System (ADS)
Bogachev, Mikhail I.; Markelov, Oleg A.; Kayumov, Airat R.; Bunde, Armin
2017-02-01
Understanding the physical principles that govern the complex DNA structural organization as well as its mechanical and thermodynamical properties is essential for the advancement in both life sciences and genetic engineering. Recently we have discovered that the complex DNA organization is explicitly reflected in the arrangement of nucleotides depicted by the universal power law tailed internucleotide interval distribution that is valid for complete genomes of various prokaryotic and eukaryotic organisms. Here we suggest a superstatistical model that represents a long DNA molecule by a series of consecutive ~150 bp DNA segments with the alternation of the local nucleotide composition between segments exhibiting long-range correlations. We show that the superstatistical model and the corresponding DNA generation algorithm explicitly reproduce the laws governing the empirical nucleotide arrangement properties of the DNA sequences for various global GC contents and optimal living temperatures. Finally, we discuss the relevance of our model in terms of the DNA mechanical properties. As an outlook, we focus on finding the DNA sequences that encode a given protein while simultaneously reproducing the nucleotide arrangement laws observed from empirical genomes, that may be of interest in the optimization of genetic engineering of long DNA molecules.
Translocation of an 89-kDa periplasmic protein is associated with Holospora infection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iwatani, Koichi; Dohra, Hideo; Lang, B. Franz
2005-12-02
The symbiotic bacterium Holospora obtusa infects the macronucleus of the ciliate Paramecium caudatum. After ingestion by its host, an infectious form of Holospora with an electron-translucent tip passes through the host digestive vacuole and penetrates the macronuclear envelope with this tip. To investigate the underlying molecular mechanism of this process, we raised a monoclonal antibody against the tip-specific 89-kDa protein, sequenced this partially, and identified the corresponding complete gene. The deduced protein sequence carries two actin-binding motifs. Indirect immunofluorescence microscopy shows that during escape from the host digestive vacuole, the 89-kDa proteins translocates from the inside to the outside ofmore » the tip. When the bacterium invades the macronucleus, the 89-kDa protein is left behind at the entry point of the nuclear envelope. Transmission electron microscopy shows the formation of fine fibrous structures that co-localize with the antibody-labeled regions of the bacterium. Our findings suggest that the 89-kDa protein plays a role in Holospora's escape from the host digestive vacuole, the migration through the host cytoplasm, and the invasion into the macronucleus.« less
VKCDB: voltage-gated K+ channel database updated and upgraded.
Gallin, Warren J; Boutet, Patrick A
2011-01-01
The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.
Sheth, Bhavisha P; Thaker, Vrinda S
2015-10-01
Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude
2011-06-20
One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
2011-01-01
Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Seligmann, Hervé
2016-01-01
In mitochondria, secondary structures punctuate post-transcriptional RNA processing. Recently described transcripts match the human mitogenome after systematic deletions of every 4th, respectively every 4th and 5th nucleotides, called delRNAs. Here I explore predicted stem-loop hairpin formation by delRNAs, and their associations with delRNA transcription and detected peptides matching their translation. Despite missing 25, respectively 40% of the nucleotides in the original sequence, del-transformed sequences form significantly more secondary structures than corresponding randomly shuffled sequences, indicating biological function, independently of, and in combination with, previously detected delRNA and thereof translated peptides. Self-hybridization decreases delRNA abundances, indicating downregulation. Systematic deletions of the human mitogenome reveal new, unsuspected coding and structural informations. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Aquatic environmental DNA detects seasonal fish abundance and habitat preference in an urban estuary
Soboleva, Lyubov; Charlop-Powers, Zachary
2017-01-01
The difficulty of censusing marine animal populations hampers effective ocean management. Analyzing water for DNA traces shed by organisms may aid assessment. Here we tested aquatic environmental DNA (eDNA) as an indicator of fish presence in the lower Hudson River estuary. A checklist of local marine fish and their relative abundance was prepared by compiling 12 traditional surveys conducted between 1988–2015. To improve eDNA identification success, 31 specimens representing 18 marine fish species were sequenced for two mitochondrial gene regions, boosting coverage of the 12S eDNA target sequence to 80% of local taxa. We collected 76 one-liter shoreline surface water samples at two contrasting estuary locations over six months beginning in January 2016. eDNA was amplified with vertebrate-specific 12S primers. Bioinformatic analysis of amplified DNA, using a reference library of GenBank and our newly generated 12S sequences, detected most (81%) locally abundant or common species and relatively few (23%) uncommon taxa, and corresponded to seasonal presence and habitat preference as determined by traditional surveys. Approximately 2% of fish reads were commonly consumed species that are rare or absent in local waters, consistent with wastewater input. Freshwater species were rarely detected despite Hudson River inflow. These results support further exploration and suggest eDNA will facilitate fine-scale geographic and temporal mapping of marine fish populations at relatively low cost. PMID:28403183
2010-01-01
Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079
Ehrhardt, J; Säring, D; Handels, H
2007-01-01
Modern tomographic imaging devices enable the acquisition of spatial and temporal image sequences. But, the spatial and temporal resolution of such devices is limited and therefore image interpolation techniques are needed to represent images at a desired level of discretization. This paper presents a method for structure-preserving interpolation between neighboring slices in temporal or spatial image sequences. In a first step, the spatiotemporal velocity field between image slices is determined using an optical flow-based registration method in order to establish spatial correspondence between adjacent slices. An iterative algorithm is applied using the spatial and temporal image derivatives and a spatiotemporal smoothing step. Afterwards, the calculated velocity field is used to generate an interpolated image at the desired time by averaging intensities between corresponding points. Three quantitative measures are defined to evaluate the performance of the interpolation method. The behavior and capability of the algorithm is demonstrated by synthetic images. A population of 17 temporal and spatial image sequences are utilized to compare the optical flow-based interpolation method to linear and shape-based interpolation. The quantitative results show that the optical flow-based method outperforms the linear and shape-based interpolation statistically significantly. The interpolation method presented is able to generate image sequences with appropriate spatial or temporal resolution needed for image comparison, analysis or visualization tasks. Quantitative and qualitative measures extracted from synthetic phantoms and medical image data show that the new method definitely has advantages over linear and shape-based interpolation.
iPARTS2: an improved tool for pairwise alignment of RNA tertiary structures, version 2.
Yang, Chung-Han; Shih, Cheng-Ting; Chen, Kun-Tze; Lee, Po-Han; Tsai, Ping-Han; Lin, Jian-Cheng; Yen, Ching-Yu; Lin, Tiao-Yin; Lu, Chin Lung
2016-07-08
Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Seismic tomography of the area of the 2010 Beni-Ilmane earthquake sequence, north-central Algeria.
Abacha, Issam; Koulakov, Ivan; Semmane, Fethi; Yelles-Chaouche, Abd Karim
2014-01-01
The region of Beni-Ilmane (District of M'sila, north-central Algeria) was the site of an earthquake sequence that started on 14 May 2010. This sequence, which lasted several months, was triggered by conjugate E-W reverse and N-S dextral faulting. To image the crustal structure of these active faults, we used a set of 1406 well located aftershocks events and applied the local tomography software (LOTOS) algorithm, which includes absolute source location, optimization of the initial 1D velocity model, and iterative tomographic inversion for 3D seismic P- and S-wave velocities (and the Vp/Vs ratio), and source parameters. The patterns of P-wave low-velocity anomalies correspond to the alignments of faults determined from geological evidence, and the P-wave high-velocity anomalies may represent rigid blocks of the upper crust that are not deformed by regional stresses. The S-wave low-velocity anomalies coincide with the aftershock area, where relatively high values of Vp/Vs ratio (1.78) are observed compared with values in the surrounding areas (1.62-1.66). These high values may indicate high fluid contents in the aftershock area. These fluids could have been released from deeper levels by fault movements during earthquakes and migrated rapidly upwards. This hypothesis is supported by vertical sections across the study area show that the major Vp/Vs anomalies are located above the seismicity clusters.
E-MSD: an integrated data resource for bioinformatics
Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.
2005-01-01
The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192
Structures and Binding Energies of the Naphthalene Dimer in Its Ground and Excited States.
Dubinets, N O; Safonov, A A; Bagaturyants, A A
2016-05-05
Possible structures of the naphthalene dimer corresponding to local energy minima in the ground and excited (excimer) electronic states are comprehensively investigated using DFT-D and TDDFT-D methods with a special accent on the excimer structures. The corresponding binding and electronic transition energies are calculated, and the nature of the electronic states in different structures is analyzed. Several parallel (stacked) and T-shaped structures were found in both the ground and excited (excimer) states in a rather narrow energy range. The T-shaped structure with the lowest energy in the excited state exhibits a marked charge transfer from the upright molecule to the base one.
Meza-Aguilar, J. Domingo; Fromme, Petra; Torres-Larios, Alfredo; Mendoza-Hernández, Guillermo; Hernandez-Chiñas, Ulises; Monteros, Roberto A. Arreguin-Espinosa de los; Campos, Carlos A. Eslava; Fromme, Raimund
2014-01-01
Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause of acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50 % compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181-190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135-143 compared to the structure of EspP. PMID:24530907
Taskinen, Jukka P; Kiema, Tiila R; Hiltunen, J Kalervo; Wierenga, Rik K
2006-01-27
The 1.9 A structure of the C-terminal dehydrogenase part of the rat peroxisomal monomeric multifunctional enzyme type 1 (MFE-1) has been determined. In this construct (residues 260-722 and referred to as MFE1-DH) the N-terminal hydratase part of MFE-1 has been deleted. The structure of MFE1-DH shows that it consists of an N-terminal helix, followed by a Rossmann-fold domain (domain C), followed by two tightly associated helical domains (domains D and E), which have similar topology. The structure of MFE1-DH is compared with the two known homologous structures: human mitochondrial 3-hydroxyacyl-CoA dehydrogenase (HAD; sequence identity is 33%) (which is dimeric and monofunctional) and with the dimeric multifunctional alpha-chain (alphaFOM; sequence identity is 28%) of the bacterial fatty acid beta-oxidation alpha2beta2-multienzyme complex. Like MFE-1, alphaFOM has an N-terminal hydratase part and a C-terminal dehydrogenase part, and the structure comparisons show that the N-terminal helix of MFE1-DH corresponds to the alphaFOM linker helix, located between its hydratase and dehydrogenase part. It is also shown that this helix corresponds to the C-terminal helix-10 of the hydratase/isomerase superfamily, suggesting that functionally it belongs to the N-terminal hydratase part of MFE-1.
Phylodynamics of vampire bat-transmitted rabies in Argentina
DOHMEN, F. GURY; BELTRAN, F.; NOVARO, L.; RUSSO, S.; FREIRE, M. C.; VELASCO-VILLA, A.; MBAYED, V. A.; CISTERNA, D. M.
2016-01-01
Common vampire bat populations distributed from Mexico to Argentina are important rabies reservoir hosts in Latin America. The aim of this work was to analyse the population structure of the rabies virus (RABV) variants associated with vampire bats in the Americas and to study their phylodynamic pattern within Argentina. The phylogenetic analysis based on all available vampire bat-related N gene sequences showed both a geographical and a temporal structure. The two largest groups of RABV variants from Argentina were isolated from northwestern Argentina and from the central western zone of northeastern Argentina, corresponding to livestock areas with different climatic, topographic and biogeographical conditions, which determined their dissemination and evolutionary patterns. In addition, multiple introductions of the infection into Argentina, possibly from Brazil, were detected. The phylodynamic analysis suggests that RABV transmission dynamics is characterized by initial epizootic waves followed by local enzootic cycles with variable persistence. Anthropogenic interventions in the ecosystem should be assessed taking into account not only the environmental impact but also the potential risk of disease spreading through dissemination of current RABV lineages or the emergence of novel ones associated with vampire bats. PMID:24661865
Swaney, Danielle L; Wenger, Craig D; Thomson, James A; Coon, Joshua J
2009-01-27
Protein phosphorylation is central to the understanding of cellular signaling, and cellular signaling is suggested to play a major role in the regulation of human embryonic stem (ES) cell pluripotency. Here, we describe the use of conventional tandem mass spectrometry-based sequencing technology--collision-activated dissociation (CAD)--and the more recently developed method electron transfer dissociation (ETD) to characterize the human ES cell phosphoproteome. In total, these experiments resulted in the identification of 11,995 unique phosphopeptides, corresponding to 10,844 nonredundant phosphorylation sites, at a 1% false discovery rate (FDR). Among these phosphorylation sites are 5 localized to 2 pluripotency critical transcription factors--OCT4 and SOX2. From these experiments, we conclude that ETD identifies a larger number of unique phosphopeptides than CAD (8,087 to 3,868), more frequently localizes the phosphorylation site to a specific residue (49.8% compared with 29.6%), and sequences whole classes of phosphopeptides previously unobserved.
Poltev, V I; Anisimov, V M; Sanchez, C; Deriabina, A; Gonzalez, E; Garcia, D; Rivas, F; Polteva, N A
2016-01-01
It is generally accepted that the important characteristic features of the Watson-Crick duplex originate from the molecular structure of its subunits. However, it still remains to elucidate what properties of each subunit are responsible for the significant characteristic features of the DNA structure. The computations of desoxydinucleoside monophosphates complexes with Na-ions using density functional theory revealed a pivotal role of DNA conformational properties of single-chain minimal fragments in the development of unique features of the Watson-Crick duplex. We found that directionality of the sugar-phosphate backbone and the preferable ranges of its torsion angles, combined with the difference between purines and pyrimidines. in ring bases, define the dependence of three-dimensional structure of the Watson-Crick duplex on nucleotide base sequence. In this work, we extended these density functional theory computations to the minimal' fragments of DNA duplex, complementary desoxydinucleoside monophosphates complexes with Na-ions. Using several computational methods and various functionals, we performed a search for energy minima of BI-conformation for complementary desoxydinucleoside monophosphates complexes with different nucleoside sequences. Two sequences are optimized using ab initio method at the MP2/6-31++G** level of theory. The analysis of torsion angles, sugar ring puckering and mutual base positions of optimized structures demonstrates that the conformational characteristic features of complementary desoxydinucleoside monophosphates complexes with Na-ions remain within BI ranges and become closer to the corresponding characteristic features of the Watson-Crick duplex crystals. Qualitatively, the main characteristic features of each studied complementary desoxydinucleoside monophosphates complex remain invariant when different computational methods are used, although the quantitative values of some conformational parameters could vary lying within the limits typical for the corresponding family. We observe that popular functionals in density functional theory calculations lead to the overestimated distances between base pairs, while MP2 computations and the newer complex functionals produce the structures that have too close atom-atom contacts. A detailed study of some complementary desoxydinucleoside monophosphate complexes with Na-ions highlights the existence of several energy minima corresponding to BI-conformations, in other words, the complexity of the relief pattern of the potential energy surface of complementary desoxydinucleoside monophosphate complexes. This accounts for variability of conformational parameters of duplex fragments with the same base sequence. Popular molecular mechanics force fields AMBER and CHARMM reproduce most of the conformational characteristics of desoxydinucleoside monophosphates and their complementary complexes with Na-ions but fail to reproduce some details of the dependence of the Watson-Crick duplex conformation on the nucleotide sequence.
2013-01-01
Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate. PMID:24359548
Topological Structure of the Space of Phenotypes: The Case of RNA Neutral Networks
Aguirre, Jacobo; Buldú, Javier M.; Stich, Michael; Manrubia, Susanna C.
2011-01-01
The evolution and adaptation of molecular populations is constrained by the diversity accessible through mutational processes. RNA is a paradigmatic example of biopolymer where genotype (sequence) and phenotype (approximated by the secondary structure fold) are identified in a single molecule. The extreme redundancy of the genotype-phenotype map leads to large ensembles of RNA sequences that fold into the same secondary structure and can be connected through single-point mutations. These ensembles define neutral networks of phenotypes in sequence space. Here we analyze the topological properties of neutral networks formed by 12-nucleotides RNA sequences, obtained through the exhaustive folding of sequence space. A total of 412 sequences fragments into 645 subnetworks that correspond to 57 different secondary structures. The topological analysis reveals that each subnetwork is far from being random: it has a degree distribution with a well-defined average and a small dispersion, a high clustering coefficient, and an average shortest path between nodes close to its minimum possible value, i.e. the Hamming distance between sequences. RNA neutral networks are assortative due to the correlation in the composition of neighboring sequences, a feature that together with the symmetries inherent to the folding process explains the existence of communities. Several topological relationships can be analytically derived attending to structural restrictions and generic properties of the folding process. The average degree of these phenotypic networks grows logarithmically with their size, such that abundant phenotypes have the additional advantage of being more robust to mutations. This property prevents fragmentation of neutral networks and thus enhances the navigability of sequence space. In summary, RNA neutral networks show unique topological properties, unknown to other networks previously described. PMID:22028856
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
A benchmark testing ground for integrating homology modeling and protein docking.
Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima
2017-01-01
Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Hodge Numbers from Picard-Fuchs Equations
NASA Astrophysics Data System (ADS)
Doran, Charles F.; Harder, Andrew; Thompson, Alan
2017-06-01
Given a variation of Hodge structure over P^1 with Hodge numbers (1,1,\\dots,1), we show how to compute the degrees of the Deligne extension of its Hodge bundles, following Eskin-Kontsevich-Möller-Zorich, by using the local exponents of the corresponding Picard-Fuchs equation. This allows us to compute the Hodge numbers of Zucker's Hodge structure on the corresponding parabolic cohomology groups. We also apply this to families of elliptic curves, K3 surfaces and Calabi-Yau threefolds.
Automatic Tool for Local Assembly Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whole community shotgun sequencing of total DNA (i.e. metagenomics) and total RNA (i.e. metatranscriptomics) has provided a wealth of information in the microbial community structure, predicted functions, metabolic networks, and is even able to reconstruct complete genomes directly. Here we present ATLAS (Automatic Tool for Local Assembly Structures) a comprehensive pipeline for assembly, annotation, genomic binning of metagenomic and metatranscriptomic data with an integrated framework for Multi-Omics. This will provide an open source tool for the Multi-Omic community at large.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Emergent high-spin state above 7 GPa in superconducting FeSe
NASA Astrophysics Data System (ADS)
Lebert, B. W.; Balédent, V.; Toulemonde, P.; Ablett, J. M.; Rueff, J.-P.
2018-05-01
The local electronic and magnetic properties of superconducting FeSe have been investigated by K β x-ray emission and simultaneous x-ray absorption spectroscopy (XAS) at the Fe K edge at high pressure and low temperature. Our results indicate a sluggish decrease of the local Fe spin moment under pressure up to 7 GPa, in line with previous reports, followed by a sudden increase at higher pressure. The magnetic surge is preceded by an abrupt change of the Fe local structure as observed by the decrease of the XAS preedge region intensity and corroborated by ab initio simulations. This pressure corresponds to a structural transition from the C m m a form to the denser P b n m form with octahedral coordination of iron. Finally, the near-edge region of the XAS spectra shows a change before this transition at 5 GPa, corresponding well with the onset pressure of the sudden enhancement of Tc. Our results emphasize the delicate interplay between structural, magnetic, and superconducting properties in FeSe under pressure.
Antunes, Jacob; Viswanath, Satish; Brady, Justin T; Crawshaw, Benjamin; Ros, Pablo; Steele, Scott; Delaney, Conor P; Paspulati, Raj; Willis, Joseph; Madabhushi, Anant
2018-07-01
The objective of this study was to develop and quantitatively evaluate a radiology-pathology fusion method for spatially mapping tissue regions corresponding to different chemoradiation therapy-related effects from surgically excised whole-mount rectal cancer histopathology onto preoperative magnetic resonance imaging (MRI). This study included six subjects with rectal cancer treated with chemoradiation therapy who were then imaged with a 3-T T2-weighted MRI sequence, before undergoing mesorectal excision surgery. Excised rectal specimens were sectioned, stained, and digitized as two-dimensional (2D) whole-mount slides. Annotations of residual disease, ulceration, fibrosis, muscularis propria, mucosa, fat, inflammation, and pools of mucin were made by an expert pathologist on digitized slide images. An expert radiologist and pathologist jointly established corresponding 2D sections between MRI and pathology images, as well as identified a total of 10 corresponding landmarks per case (based on visually similar structures) on both modalities (five for driving registration and five for evaluating alignment). We spatially fused the in vivo MRI and ex vivo pathology images using landmark-based registration. This allowed us to spatially map detailed annotations from 2D pathology slides onto corresponding 2D MRI sections. Quantitative assessment of coregistered pathology and MRI sections revealed excellent structural alignment, with an overall deviation of 1.50 ± 0.63 mm across five expert-selected anatomic landmarks (in-plane misalignment of two to three pixels at 0.67- to 1.00-mm spatial resolution). Moreover, the T2-weighted intensity distributions were distinctly different when comparing fibrotic tissue to perirectal fat (as expected), but showed a marked overlap when comparing fibrotic tissue and residual rectal cancer. Our fusion methodology enabled successful and accurate localization of post-treatment effects on in vivo MRI. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Multi-subject Manifold Alignment of Functional Network Structures via Joint Diagonalization.
Nenning, Karl-Heinz; Kollndorfer, Kathrin; Schöpf, Veronika; Prayer, Daniela; Langs, Georg
2015-01-01
Functional magnetic resonance imaging group studies rely on the ability to establish correspondence across individuals. This enables location specific comparison of functional brain characteristics. Registration is often based on morphology and does not take variability of functional localization into account. This can lead to a loss of specificity, or confounds when studying diseases. In this paper we propose multi-subject functional registration by manifold alignment via coupled joint diagonalization. The functional network structure of each subject is encoded in a diffusion map, where functional relationships are decoupled from spatial position. Two-step manifold alignment estimates initial correspondences between functionally equivalent regions. Then, coupled joint diagonalization establishes common eigenbases across all individuals, and refines the functional correspondences. We evaluate our approach on fMRI data acquired during a language paradigm. Experiments demonstrate the benefits in matching accuracy achieved by coupled joint diagonalization compared to previously proposed functional alignment approaches, or alignment based on structural correspondences.
The cotton centromere contains a Ty3-gypsy-like LTR retroelement.
Luo, Song; Mach, Jennifer; Abramson, Bradley; Ramirez, Rolando; Schurr, Robert; Barone, Pierluigi; Copenhaver, Gregory; Folkerts, Otto
2012-01-01
The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species.
The Cotton Centromere Contains a Ty3-gypsy-like LTR Retroelement
Luo, Song; Mach, Jennifer; Abramson, Bradley; Ramirez, Rolando; Schurr, Robert; Barone, Pierluigi; Copenhaver, Gregory; Folkerts, Otto
2012-01-01
The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species. PMID:22536361
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Genomic diversity in switchgrass (Panicum virgatum): from the continental scale to a dune landscape
Morris, Geoffrey P.; Grabowski, Paul; Borevitz, Justin O.
2011-01-01
Connecting broad-scale patterns of genetic variation and population structure to genetic diversity on a landscape is a key step towards understanding historical processes of migration and adaptation. New genomic approaches can be used to increase the resolution of phylogeographic studies while reducing locus sampling effects and circumventing ascertainment bias. Here, we use a novel approach based on high-throughput sequencing to characterize genetic diversity in complete chloroplast genomes and >10,000 nuclear loci in switchgrass, across a continental and landscape scale. Switchgrass is a North American tallgrass species, which is widely used in conservation and perennial biomass production, and shows strong ecotypic adaptation and population structure across the continental range. We sequenced 40.9 billion base pairs from 24 individuals from across the species’ range and 20 individuals from the Indiana Dunes. Analysis of plastome sequence revealed 203 variable SNP sites that define eight haplogroups, which are differentiated by 4 to 127 SNPs and confirmed by patterns of indel variation. These include three deeply divergent haplogroups, which correspond to the previously described lowland-upland ecotypic split and a novel upland haplogroup split that dates to the mid-Pleistoscene. Most of the plastome haplogroup diversity present in the northern switchgrass range, including in the Indiana Dunes, originated in the mid- or upper-Pleistocene prior to the most recent postglacial recolonization. Furthermore, a recently colonized landscape feature (~150 ya) in the Indiana Dunes contains several deeply divergent upland haplogroups. Nuclear markers also support a deep lowland-upland split, followed by limited gene flow, and show extensive gene flow in the local population of the Indiana Dunes. PMID:22060816
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
NASA Astrophysics Data System (ADS)
Eigen, Manfred
1988-12-01
The Darwinian concept of evolution through natural selection has been revised and put on a solid physical basis, in a form which applies to self-replicable macromolecules. Two new concepts are introduced: sequence space and quasi-species. Evolutionary change in the DNA- or RNA-sequence of a gene can be mapped as a trajectory in a sequence space of dimension ν, where ν corresponds to the number of changeable positions in the genomic sequence. Emphasis, however, is shifted from the single surviving wildtype, a single point in the sequence space, to the complex structure of the mutant distribution that constitutes the quasi-species. Selection is equivalent to an establishment of the quasi-species in a localized region of sequence space, subject to threshold conditions for the error rate and sequence length. Arrival of a new mutant may violate the local threshold condition and thereby lead to a displacement of the quasi-species into a different region of sequence space. This transformation is similar to a phase transition; the dynamical equations that describe the quase-species have been shown to be analogous to those of the two-dimensional Ising model of ferromagnetism. The occurrence of a selectively advantageous mutant is biased by the particulars of the quasi-species distribution, whose mutants are populated according to their fitness relative to that of the wild-type. Inasmuch as fitness regions are connected (like mountain ridges) the evolutionary trajectory is guided to regions of optimal fitness. Evolution experiments in test tubes confirm this modification of the simple chance and law nature of the Darwinian concept. The results of the theory can also be applied to the construction of a machine that provides optimal conditions for a rapid evolution of functionally active macromolecules. An introduction to the physics of molecular evolution by the author has appeared recently.1 Detailed studies of the kinetics and mechanisms of replication of RNA, the most likely candidate for early evolution2,3, and of the implications on natural selection have been given in Refs. 4 and 5. The quasi-species model has been constructed in Refs. 6 and 7 using the concept of sequence space. Subsequently various methods have been invented to elucidate this concept and to relate it to the theory of critical phenomena 8-19. The instability of the quasi-species at the error threshold is discussed in Ref. 10. Evolution experiments with RNA strands in test tubes are described in Refs. 21 and 22.
Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA
Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev
2012-01-01
B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350
Towards Long-Range RNA Structure Prediction in Eukaryotic Genes.
Pervouchine, Dmitri D
2018-06-15
The ability to form an intramolecular structure plays a fundamental role in eukaryotic RNA biogenesis. Proximate regions in the primary transcripts fold into a local secondary structure, which is then hierarchically assembled into a tertiary structure that is stabilized by RNA-binding proteins and long-range intramolecular base pairings. While the local RNA structure can be predicted reasonably well for short sequences, long-range structure at the scale of eukaryotic genes remains problematic from the computational standpoint. The aim of this review is to list functional examples of long-range RNA structures, to summarize current comparative methods of structure prediction, and to highlight their advances and limitations in the context of long-range RNA structures. Most comparative methods implement the “first-align-then-fold” principle, i.e., they operate on multiple sequence alignments, while functional RNA structures often reside in non-conserved parts of the primary transcripts. The opposite “first-fold-then-align” approach is currently explored to a much lesser extent. Developing novel methods in both directions will improve the performance of comparative RNA structure analysis and help discover novel long-range structures, their higher-order organization, and RNA⁻RNA interactions across the transcriptome.
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
Heras, Sandra; Maltagliati, Ferruccio; Fernández, Maria Victoria; Roldán, María Inés
2016-07-01
With this work we addressed some molecular systematic issues within the Mugil cephalus species complex. Particular attention was paid to the debated situations of: (i) Mugil liza, occurring in partial sympatry with Mugil cephalus in the northwestern Atlantic, and (ii) Mugil platanus, considered by some authors a synonymy of the former species and distributed in the southwestern Atlantic. We sequenced 79 individuals of a 465-bp portion of the mitochondrial control region (CR) from 8 western Atlantic and 2 Mediterranean localities. In addition, all CR sequences available from GenBank for the studied taxa were added to our dataset, for a total of 323 individuals. Overall, 229 haplotypes corresponding to 8 divergent monophyletic lineages were detected. Results of phylogenetic analyses were consistent with the occurrence of past speciation events producing the observed lineages. Of these lineages, 7 correspond to cryptic species and one is constituted by M. liza and M. platanus. As a matter of fact, these 2 taxa constitute a single lineage within the M. cephalus species complex. However, individuals of M. liza/M. platanus lineage analyzed by means of the 18 mitochondrial markers available in GenBank exhibited a degree of genetic diversity consistent with highly divergent populations. Of the 8 lineages detected, the Mediterraean one (type locality) corresponds to M. cephalus; the lineage M. liza/M. platanus should be named M. liza, under the priority principle, and the left 6 lineages need formal description. © 2015 International Society of Zoological Sciences, Institute of Zoology/Chinese Academy of Sciences and John Wiley & Sons Australia, Ltd.
Active bacterial community structure along vertical redox gradients in Baltic Sea sediment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jansson, Janet; Edlund, Anna; Hardeman, Fredrik
Community structures of active bacterial populations were investigated along a vertical redox profile in coastal Baltic Sea sediments by terminal-restriction fragment length polymorphism (T-RFLP) and clone library analysis. According to correspondence analysis of T-RFLP results and sequencing of cloned 16S rRNA genes, the microbial community structures at three redox depths (179 mV, -64 mV and -337 mV) differed significantly. The bacterial communities in the community DNA differed from those in bromodeoxyuridine (BrdU)-labeled DNA, indicating that the growing members of the community that incorporated BrdU were not necessarily the most dominant members. The structures of the actively growing bacterial communities weremore » most strongly correlated to organic carbon followed by total nitrogen and redox potentials. Bacterial identification by sequencing of 16S rRNA genes from clones of BrdU-labeled DNA and DNA from reverse transcription PCR (rt-PCR) showed that bacterial taxa involved in nitrogen and sulfur cycling were metabolically active along the redox profiles. Several sequences had low similarities to previously detected sequences indicating that novel lineages of bacteria are present in Baltic Sea sediments. Also, a high number of different 16S rRNA gene sequences representing different phyla were detected at all sampling depths.« less
Structure and Expression of Genes for Flavivirus Immunogens
1988-02-01
regions, corresponding to amino acid residues 280 to 414 of the E protien , also reacted with 10 monoclonal antibodies (MAbs) generated against antigens... protien sequence. Furthermore, the presentation of these epitopes apparently requires the formation of a disulfide bridge between Cys-304 and Cys-335. 5
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
NASA Technical Reports Server (NTRS)
Saravanos, Dimitris A.; Heyliger, Paul R.; Hopkins, Dale A.
1996-01-01
Recent developments on layerwise mechanics for the analysis of composite laminates and structures with piezoelectric actuators and sensors are reviewed. The mechanics implement layerwise representations of displacements and electric potential, and can model both the global and local electromechanical response of smart composite structures. The corresponding finite-element implementations for the static and dynamic analysis of smart piezoelectric composite structures are also summarized. Select application illustrate the accuracy, robustness and capability of the developed mechanics to capture the global and local dynamic response of thin and/or thick laminated piezoelectric plates.
Musinova, Yana R; Kananykhina, Eugenia Y; Potashnikova, Daria M; Lisitsyna, Olga M; Sheval, Eugene V
2015-01-01
The majority of known nucleolar proteins are freely exchanged between the nucleolus and the surrounding nucleoplasm. One way proteins are retained in the nucleoli is by the presence of specific amino acid sequences, namely nucleolar localization signals (NoLSs). The mechanism by which NoLSs retain proteins inside the nucleoli is still unclear. Here, we present data showing that the charge-dependent (electrostatic) interactions of NoLSs with nucleolar components lead to nucleolar accumulation as follows: (i) known NoLSs are enriched in positively charged amino acids, but the NoLS structure is highly heterogeneous, and it is not possible to identify a consensus sequence for this type of signal; (ii) in two analyzed proteins (NF-κB-inducing kinase and HIV-1 Tat), the NoLS corresponds to a region that is enriched for positively charged amino acid residues; substituting charged amino acids with non-charged ones reduced the nucleolar accumulation in proportion to the charge reduction, and nucleolar accumulation efficiency was strongly correlated with the predicted charge of the tested sequences; and (iii) sequences containing only lysine or arginine residues (which were referred to as imitative NoLSs, or iNoLSs) are accumulated in the nucleoli in a charge-dependent manner. The results of experiments with iNoLSs suggested that charge-dependent accumulation inside the nucleoli was dependent on interactions with nucleolar RNAs. The results of this work are consistent with the hypothesis that nucleolar protein accumulation by NoLSs can be determined by the electrostatic interaction of positively charged regions with nucleolar RNAs rather than by any sequence-specific mechanism. Copyright © 2014 Elsevier B.V. All rights reserved.
Zhao, Ya-E; Wang, Zheng-Hang; Xu, Yang; Wu, Li-Ping; Hu, Li
2013-10-01
According to base pairing, the rRNA folds into corresponding secondary structures, which contain additional phylogenetic information. On the basis of sequencing for complete rDNA sequences (18S, ITS1, 5.8S, ITS2 and 28S rDNA) of Demodex, we predicted the secondary structure of the complete rDNA sequence (18S, 5.8S, and 28S rDNA) of Demodex folliculorum, which was in concordance with that of the main arthropod lineages in past studies. And together with the sequence data from GenBank, we also predicted the secondary structures of divergent domains in SSU rRNA of 51 species and in LSU rRNA of 43 species from four superfamilies in Acari (Cheyletoidea, Tetranychoidea, Analgoidea and Ixodoidea). The multiple alignment among the four superfamilies in Acari showed that, insertions from Tetranychoidea SSU rRNA formed two newly proposed helixes, and helix c3-2b of LSU rRNA was absent in Demodex (Cheyletoidea) taxa. Generally speaking, LSU rRNA presented more remarkable differences than SSU rRNA did, mainly in D2, D3, D5, D7a, D7b, D8 and D10. Copyright © 2013 Elsevier Inc. All rights reserved.
Constraint-based stereo matching
NASA Technical Reports Server (NTRS)
Kuan, D. T.
1987-01-01
The major difficulty in stereo vision is the correspondence problem that requires matching features in two stereo images. Researchers describe a constraint-based stereo matching technique using local geometric constraints among edge segments to limit the search space and to resolve matching ambiguity. Edge segments are used as image features for stereo matching. Epipolar constraint and individual edge properties are used to determine possible initial matches between edge segments in a stereo image pair. Local edge geometric attributes such as continuity, junction structure, and edge neighborhood relations are used as constraints to guide the stereo matching process. The result is a locally consistent set of edge segment correspondences between stereo images. These locally consistent matches are used to generate higher-level hypotheses on extended edge segments and junctions to form more global contexts to achieve global consistency.
Soyolt; Galsannorbu; Yongping; Wunenbayar; Liu, Guohou; Khasbagan
2013-04-24
Folk names of plants are the root of traditional plant biodiversity knowledge. In pace with social change and economic development, Mongolian knowledge concerning plant diversity is gradually vanishing. Collection and analysis of Mongolian folk names of plants is extremely important. During 2008 to 2012, the authors have been to the Arhorchin National Nature Reserve area 5 times. Fieldwork was done in 13 villages, with 56 local Mongol herdsmen being interviewed. This report documents plant folk names, analyzes the relationship between folk names and scientific names, looks at the structure and special characteristics of folk names, plant use information, and comparative analysis were also improved. Ethnobotanical interviewing methods of free-listing and open-ended questionnaires were used. Ethnobotanical interview and voucher specimen collection were carried out in two ways as local plant specimens were collected beforehand and then used in interviews, and local Mongol herdsmen were invited to the field and interviewed while collecting voucher specimens. Mongolian oral language was used as the working language and findings were originally recorded in Mongolian written language. Scientific names of plants are defined through collection and identification of voucher specimens by the methods of plant taxonomy. A total of 146 folk names of local plants are recorded. Plant folk names corresponded with 111 species, 1 subspecies, 7 varieties, 1 form, which belong to 42 families and 88 genera. The correspondence between plant folk names and scientific names may be classified as one to one correspondence, two or three to one correspondence, and one to multitude correspondence. The structure of folk names were classified as primary names, secondary names and borrowed names. There were 12 folk names that contain animal names and they have correspondence with 15 species. There are nine folk names that contain usage information and they have correspondence with 10 species in which five species and one variety of plant are still used by the local people. The results of comparative analysis on the Mongol herdsmen in the Arhorchin National Nature Reserve and the Mongolians in the Ejina desert area shows that there are some similarities, as well as many differences whether in language or in the structure. In the corresponding rate between plant folk names and scientific names yielded a computational correspondence of 82.19%, which can be considered as a high level of consistency between scientific knowledge and traditional knowledge in botanical nomenclature. Primary names have most cultural significance in the plant folk names. Special characteristic of plant folk names were focused on the physical characteristics of animals which were closely related to their traditional animal husbandry and environment. Plant folk names are not only a code to distinguish between different plant species, but also a kind of culture rich in a deep knowledge concerning nature. The results of comparative analysis shows that Mongolian culture in terms of plant nomenclature have characteristics of diversity between the different regions and different tribes.
The red-sequence of 72 WINGS local galaxy clusters
NASA Astrophysics Data System (ADS)
Valentinuzzi, T.; Poggianti, B. M.; Fasano, G.; D'Onofrio, M.; Moretti, A.; Ramella, M.; Biviano, A.; Fritz, J.; Varela, J.; Bettoni, D.; Vulcani, B.; Moles, M.; Couch, W. J.; Dressler, A.; Kjærgaard, P.; Omizzolo, A.; Cava, A.
2011-12-01
We study the color - magnitude red sequence and blue fraction of 72 X-ray selected galaxy clusters at z = 0.04-0.07 from the WINGS survey, searching for correlations between the characteristics of the red sequence (RS) and the environment. We consider the slope and scatter of the red sequence, the number ratio of red luminous-to-faint galaxies, the blue fraction, and the fractions of ellipticals, S0s, and spirals that compose the RS. None of these quantities correlate with the cluster velocity dispersion, X-ray luminosity, number of cluster substructures, BCG prevalence over next brightest galaxies, and the spatial concentration of ellipticals. The properties of the RS, instead, depend strongly on local galaxy density. Higher density regions have a smaller RS scatter, a higher luminous-to-faint ratio, a lower blue fraction, and a lower spiral fraction on the RS. Our results clearly illustrate the prominent effect of the local density in setting the epoch when galaxies become passive and join the red sequence, as opposed to the mass of the galaxy host structure.
Hirota, Tadao; Hirohata, Tetsuo; Mashima, Hiroshi; Satoh, Toshiyuki; Obara, Yoshiaki
2004-11-01
Genetic structure of the large Japanese field mouse populations in suburban landscape of West Tokyo, Japan was determined using mitochondrial DNA control region sequence. Samples were collected from six habitats linked by forests and green tract along the Tama River, and from two forests segregated by urban areas from those continuous habitats. Thirty-five haplotypes were detected in 221 animals. Four to eight haplotypes were found within each local population belonging to the continuous landscape. Some haplotypes were shared by two or three adjacent local populations. On the other hand, two isolated habitats were occupied by one or two indigenous haplotypes. Significant genetic differentiation between all pairs of local populations, except for one pair in the continuous habitats, was found by analysis of molecular variance (amova). The geographical distance between habitats did not explain the large variance of pairwise F(ST)-values among local populations. F(ST)-values between local populations segregated by urban areas were higher than those between local populations in the continuous habitat, regardless of geographical distance. The results of this study demonstrated quantitatively that urban areas inhibit the migration of Apodemus speciosus, whereas a linear green tract along a river functions as a corridor. Moreover, it preserves the metapopulation structure of A. speciosus as well as the corridors in suburban landscape.
Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.
2007-01-01
We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.
Meiselbach, Heike; Sticht, Heinrich
2011-08-01
The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.
Structural genomics reveals EVE as a new ASCH/PUA-related domain
Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard
2014-01-01
Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354
Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertonati, C.; Punta, M; Fischer, M
2008-01-01
We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less
[Identification and polymorphism of pectinase genes PGU in the Saccharomyces bayanus complex].
Shalamitskiy, M Yu; Naumov, G I
2016-05-01
Pectinase (endo-polygalacturonase) is the key enzyme splitting plant pectin. The corresponding single gene PGU1 is documented for the yeast S. cerevisiae. On the basis of phylogenetic analysis of the PGU nucleotide sequence available in the GenBank, a family of divergent PGU genes is found in the species complex S. bayanus: S. bayanus var. uvarum, S. eubayanus, and hybrid taxon S. pastorianus. The PGU genes have different chromosome localization.
Lacap, Donnabella C; Smith, Gavin J D; Warren-Rhodes, Kimberley; Pointing, Stephen B
2005-07-01
Cyanobacterial mats were characterized from pools of 45-60 degrees C in near-neutral pH, low-sulphide geothermal springs in the Philippines. Mat structure did not vary with temperature. All mats possessed highly ordered layers of airspaces at both the macroscopic and microscopic level, and these appear to be an adaptation to a free-floating growth habit. Upper mat layers supported biomass with elevated carotenoid:chlorophyll a ratios and an as yet uncharacterized waxy layer on the dorsal surface. Microscopic examination revealed mats comprised a single Fischerella morphotype, with abundant heterocysts throughout mats at all temperatures. Molecular analysis of mat community structure only partly matched morphological identification. All samples supported greater 16S rDNA-defined diversity than morphology suggested, with a progressive loss in the number of genotypes with increasing temperature. Fischerella-like sequences were recovered from mats occurring at all temperatures, but some mats also yielded Oscillatoria-like sequences, although corresponding phenotypes were not observed. Phylogenetic analysis revealed that Fischerella-like sequences were most closely affiliated with Fischerella major and the Oscillatoria-like sequences with Oscillatoria amphigranulata.
A de novo redesign of the WW domain
Kraemer-Pecore, Christina M.; Lecomte, Juliette T.J.; Desjarlais, John R.
2003-01-01
We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small β-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring l-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5°C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy. PMID:14500877
A de novo redesign of the WW domain.
Kraemer-Pecore, Christina M; Lecomte, Juliette T J; Desjarlais, John R
2003-10-01
We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small beta-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring L-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5 degrees C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy.
NASA Astrophysics Data System (ADS)
Goltsch, Mandy
Design denotes the transformation of an identified need to its physical embodiment in a traditionally iterative approach of trial and error. Conceptual design plays a prominent role but an almost infinite number of possible solutions at the outset of design necessitates fast evaluations. The corresponding practice of empirical equations and low fidelity analyses becomes obsolete in the light of novel concepts. Ever increasing system complexity and resource scarcity mandate new approaches to adequately capture system characteristics. Contemporary concerns in atmospheric science and homeland security created an operational need for unconventional configurations. Unmanned long endurance flight at high altitudes offers a unique showcase for the exploration of new design spaces and the incidental deficit of conceptual modeling and simulation capabilities. Structural and aerodynamic performance requirements necessitate light weight materials and high aspect ratio wings resulting in distinct structural and aeroelastic response characteristics that stand in close correlation with natural vibration modes. The present research effort evolves around the development of an efficient and accurate optimization algorithm for high aspect ratio wings subject to natural frequency constraints. Foundational corner stones are beam dimensional reduction and modal perturbation redesign. Local and global analyses inherent to the former suggest corresponding levels of local and global optimization. The present approach departs from this suggestion. It introduces local level surrogate models to capacitate a methodology that consists of multi level analyses feeding into a single level optimization. The innovative heart of the new algorithm originates in small perturbation theory. A sequence of small perturbation solutions allows the optimizer to make incremental movements within the design space. It enables a directed search that is free of costly gradients. System matrices are decomposed based on a Timoshenko stiffness effect separation. The formulation of respective linear changes falls back on surrogate models that approximate cross sectional properties. Corresponding functional responses are readily available. Their direct use by the small perturbation based optimizer ensures constitutive laws and eliminates a previously necessary optimization at the local level. The scope of the present work is derived from an existing configuration such as a conceptual baseline or a prototype that experiences aeroelastic instabilities. Due to the lack of respective design studies in the traditional design process it is not uncommon for an initial wing design to have such stability problems. The developed optimization scheme allows the effective redesign of high aspect ratio wings subject to natural frequency objectives. Its successful application is demonstrated by three separate optimization studies. The implementation results of all three studies confirm that the gradient liberation of the new methodology brings about great computational savings. A generic wing study is used to indicate the connection between the proposed methodology and the aeroelastic stability problems outlined in the motivation. It is also used to illustrate an important practical aspect of structural redesign, i.e., a minimum departure from the existing baseline configuration. The proposed optimization scheme is naturally conducive to this practical aspect by using a minimum change optimization criterion. However, only an elemental formulation truly enables a minimum change solution. It accounts for the spanwise significance of a structural modification to the mode of interest. This idea of localized reinforcement greatly benefits the practical realization of structural redesign efforts. The implementation results also highlight the fundamental limitation of the proposed methodology. The exclusive consideration of mass and stiffness effects on modal response characteristics disregards other disciplinary problems such as allowable stresses or buckling loads. Both are of central importance to the structural integrity of an aircraft but are currently not accounted for in the proposed optimization scheme. The concluding discussion thus outlines the need for respective constraints and/or additional analyses to capture all requirements necessary for a comprehensive structural redesign study.
Singh, B N; Mudgil, Yashwanti; Sopory, S K; Reddy, M K
2003-07-01
We have successfully expressed enzymatically active plant topoisomerase II in Escherichia coli for the first time, which has enabled its biochemical characterization. Using a PCR-based strategy, we obtained a full-length cDNA and the corresponding genomic clone of tobacco topoisomerase II. The genomic clone has 18 exons interrupted by 17 introns. Most of the 5' and 3' splice junctions follow the typical canonical consensus dinucleotide sequence GU-AG present in other plant introns. The position of introns and phasing with respect to primary amino acid sequence in tobacco TopII and Arabidopsis TopII are highly conserved, suggesting that the two genes are evolved from the common ancestral type II topoisomerase gene. The cDNA encodes a polypeptide of 1482 amino acids. The primary amino acid sequence shows a striking sequence similarity, preserving all the structural domains that are conserved among eukaryotic type II topoisomerases in an identical spatial order. We have expressed the full-length polypeptide in E. coli and purified the recombinant protein to homogeneity. The full-length polypeptide relaxed supercoiled DNA and decatenated the catenated DNA in a Mg(2+)- and ATP-dependent manner, and this activity was inhibited by 4'-(9-acridinylamino)-3'-methoxymethanesulfonanilide (m-AMSA). The immunofluorescence and confocal microscopic studies, with antibodies developed against the N-terminal region of tobacco recombinant topoisomerase II, established the nuclear localization of topoisomerase II in tobacco BY2 cells. The regulated expression of tobacco topoisomerase II gene under the GAL1 promoter functionally complemented a temperature-sensitive TopII(ts) yeast mutant.
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoefler, G.; Forstner, M.; Hulla, W.
1994-01-01
Enoyl-CoA hydratase:3-hydroxyacyl-CoA dehydrogenase bifunctional enzyme is one of the four enzymes of the peroxisomal, [beta]-oxidation pathway. Here, the authors report the full-length human cDNA sequence and the localization of the corresponding gene on chromosome 3q26.3-3q28. The cDNA sequence spans 3779 nucleotides with an open reading frame of 2169 nucleotides. The tripeptide SKL at the carboxy terminus, known to serve as a peroxisomal targeting signal, is present. DNA sequence comparison of the coding region showed an 80% homology between human and rat bifunctional enzyme cDNA. The 3[prime] noncoding sequence contains 117 nucleotides homologous to an Alu repeat. Based on sequence comparison,more » they propose that these nucleotides are a free left Alu arm with 86% homology to the Alu-J family. RNA analysis shows one band with highest intensity in liver and kidney. This cDNA will allow in-depth studies of molecular defects in patients with defective peroxisomal bifunctional enzyme. Moreover, it will also provide a means for studying the regulation of peroxisomal [beta]-oxidation in humans. 33 refs., 5 figs.« less
NASA Astrophysics Data System (ADS)
Liu, Feng-xiang; Liu, Rang-su; Hou, Zhao-yang; Liu, Hai-Rong; Tian, Ze-an; Zhou, Li-li
2009-02-01
The rapid solidification processes of Al 50Mg 50 liquid alloy consisting of 50,000 atoms have been simulated by using molecular dynamics method based on the effective pair potential derived from the pseudopotential theory. The formation mechanisms of atomic clusters during the rapid solidification processes have been investigated adopting a new cluster description method—cluster-type index method (CTIM). The simulated partial structure factors are in good agreement with the experimental results. And Al-Mg amorphous structure characterized with Al-centered icosahedral topological short-range order (SRO) is found to form during the rapid solidification processes. The icosahedral cluster plays a key role in the microstructure transition. Besides, it is also found that the size distribution of various clusters in the system presents a magic number sequence of 13, 19, 23, 25, 29, 31, 33, 37, …. The magic clusters are more stable and mainly correspond to the incompact arrangements of linked icosahedra in the form of rings, chains or dendrites. And each magic number point stands correspondingly for one certain combining form of icosahedra. This magic number sequence is different from that generated in the solidification structure of liquid Al and those obtained by methods of gaseous deposition and ionic spray, etc.
Late Precambrian-Cambrian sediments of Huqf group, Sultanate of Oman
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorin, G.E.; Racz, L.G.; Walter, M.R.
1982-12-01
The Huqf Group is the oldest known sedimentary sequence overlying crystalline basement in the Sultanate of Oman. It crops out on a broad regional high, the Huqf Axis, which forms a dominating structural element on the southeastern edge of the Arabian peninsula. Subsurface and outcrop evidence within and outside of Oman suggests that the sediments of the Huqf Group lie within the age span of late Precambrian to Early-Middle Cambrian. The Huqf Group is subdivided into five formations corresponding to an alternation of clastics (Abu Mahara and Shuram Formations) and carbonates (Khufai and Buah Formations) deposited in essentially shallow marinemore » to supratidal (or fluviatile) conditions and terminated by an evaporitic sequence (Ara Formation). Evaporites are absent on the Huqf Axis, but they are thickly developed to the west over a large part of southern and central Oman, where they acted as the major structure former of most of Oman's fields, and even locally pierced up to the surface. Regional correlations suggest that the predominantly carbonate-evaporitic facies of the Huqf Group was widely distributed in late Precambrian-Early Cambrian time: the Huqf basin is tentatively considered part of a belt of evaporitic basins and intervening carbonate platforms, which stretched across the Pangea landmass from the Indian subcontinent (Salt Range of Pakistan) through South Yemen, Oman, and Saudi Arabia into the gulf states and Iran (Hormuz Series and carbonate platform north of the Zagros).« less
Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi
2014-01-01
Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Jonniaux, J L; Coster, F; Purnelle, B; Goffeau, A
1994-12-01
We report the amino acid sequence of 13 open reading frames (ORF > 299 bp) located on a 21.7 kb DNA segment from the left arm of chromosome XIV of Saccharomyces cerevisiae. Five open reading frames had been entirely or partially sequenced previously: WHI3, GCR2, SPX19, SPX18 and a heat shock gene similar to SSB1. The products of 8 other ORFs are new putative proteins among which N1394 is probably a membrane protein. N1346 contains a leucine zipper pattern and the corresponding ORF presents an HAP (global regulator of respiratory genes) upstream activating sequence in the promoting region. N1386 shares homologies with the DNA structure-specific recognition protein family SSRPs and the corresponding ORF is preceded by an MCB (MluI cell cycle box) upstream activating factor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaltak, Merzuk; Fernandez-Serra, Marivi; Hybertsen, Mark S.
The phases of A 2Mn 8O 16 hollandite group oxides emerge from the competition between ionic interactions, Jahn-Teller effects, charge ordering, and magnetic interactions. Their balanced treatment with feasible computational approaches can be challenging for commonly used approximations in density functional theory. Three examples (A = Ag, Li, and K) are studied with a sequence of different approximate exchange-correlation functionals. Starting from a generalized gradient approximation (GGA), an extension to include van der Waals interactions and a recently proposed meta-GGA are considered. Then local Coulomb interactions for the Mn 3d electrons are more explicitly considered with the DFT + Umore » approach. Finally, selected results from a hybrid functional approach provide a reference. Results for the binding energy of the A species in the parent oxide highlight the role of van der Waals interactions. Relatively accurate results for insertion energies can be achieved with a low-U and a high-U approach. In the low-U case, the materials are described as band metals with a high-symmetry, tetragonal crystal structure. In the high-U case, the electrons donated by A result in formation of local Mn 3+ centers and corresponding Jahn-Teller distortions characterized by a local order parameter. The resulting degree of monoclinic distortion depends on charge ordering and magnetic interactions in the phase formed. The reference hybrid functional results show charge localization and ordering. Comparison to low-temperature experiments of related compounds suggests that charge localization is the physically correct result for the hollandite group oxides studied here. Lastly, while competing effects in the local magnetic coupling are subtle, the fully anisotropic implementation of DFT + U gives the best overall agreement with results from the hybrid functional.« less
Kaltak, Merzuk; Fernandez-Serra, Marivi; Hybertsen, Mark S.
2017-12-01
The phases of A 2Mn 8O 16 hollandite group oxides emerge from the competition between ionic interactions, Jahn-Teller effects, charge ordering, and magnetic interactions. Their balanced treatment with feasible computational approaches can be challenging for commonly used approximations in density functional theory. Three examples (A = Ag, Li, and K) are studied with a sequence of different approximate exchange-correlation functionals. Starting from a generalized gradient approximation (GGA), an extension to include van der Waals interactions and a recently proposed meta-GGA are considered. Then local Coulomb interactions for the Mn 3d electrons are more explicitly considered with the DFT + Umore » approach. Finally, selected results from a hybrid functional approach provide a reference. Results for the binding energy of the A species in the parent oxide highlight the role of van der Waals interactions. Relatively accurate results for insertion energies can be achieved with a low-U and a high-U approach. In the low-U case, the materials are described as band metals with a high-symmetry, tetragonal crystal structure. In the high-U case, the electrons donated by A result in formation of local Mn 3+ centers and corresponding Jahn-Teller distortions characterized by a local order parameter. The resulting degree of monoclinic distortion depends on charge ordering and magnetic interactions in the phase formed. The reference hybrid functional results show charge localization and ordering. Comparison to low-temperature experiments of related compounds suggests that charge localization is the physically correct result for the hollandite group oxides studied here. Lastly, while competing effects in the local magnetic coupling are subtle, the fully anisotropic implementation of DFT + U gives the best overall agreement with results from the hybrid functional.« less
NASA Astrophysics Data System (ADS)
Kaltak, Merzuk; Fernández-Serra, Marivi; Hybertsen, Mark S.
2017-12-01
The phases of A2Mn8O16 hollandite group oxides emerge from the competition between ionic interactions, Jahn-Teller effects, charge ordering, and magnetic interactions. Their balanced treatment with feasible computational approaches can be challenging for commonly used approximations in density functional theory. Three examples (A = Ag, Li, and K) are studied with a sequence of different approximate exchange-correlation functionals. Starting from a generalized gradient approximation (GGA), an extension to include van der Waals interactions and a recently proposed meta-GGA are considered. Then local Coulomb interactions for the Mn 3 d electrons are more explicitly considered with the DFT + U approach. Finally, selected results from a hybrid functional approach provide a reference. Results for the binding energy of the A species in the parent oxide highlight the role of van der Waals interactions. Relatively accurate results for insertion energies can be achieved with a low-U and a high-U approach. In the low-U case, the materials are described as band metals with a high-symmetry, tetragonal crystal structure. In the high-U case, the electrons donated by A result in formation of local Mn3 + centers and corresponding Jahn-Teller distortions characterized by a local order parameter. The resulting degree of monoclinic distortion depends on charge ordering and magnetic interactions in the phase formed. The reference hybrid functional results show charge localization and ordering. Comparison to low-temperature experiments of related compounds suggests that charge localization is the physically correct result for the hollandite group oxides studied here. Finally, while competing effects in the local magnetic coupling are subtle, the fully anisotropic implementation of DFT + U gives the best overall agreement with results from the hybrid functional.
Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method
Qiu, Lei; Yuan, Shenfang; Mei, Hanfei; Qian, Weifeng
2013-01-01
To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs) is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half. PMID:24084123
Natural mummification of the human gut preserves bacteriophage DNA.
Santiago-Rodriguez, Tasha M; Fornaciari, Gino; Luciani, Stefania; Dowd, Scot E; Toranzos, Gary A; Marota, Isolina; Cano, Raul J
2016-01-01
The natural mummification process of the human gut represents a unique opportunity to study the resulting microbial community structure and composition. While results are providing insights into the preservation of bacteria, fungi, pathogenic eukaryotes and eukaryotic viruses, no studies have demonstrated that the process of natural mummification also results in the preservation of bacteriophage DNA. We characterized the gut microbiome of three pre-Columbian Andean mummies, namely FI3, FI9 and FI12, and found sequences homologous to viruses. From the sequences attributable to viruses, 50.4% (mummy FI3), 1.0% (mummy FI9) and 84.4% (mummy FI12) were homologous to bacteriophages. Sequences corresponding to the Siphoviridae, Myoviridae, Podoviridae and Microviridae families were identified. Predicted putative bacterial hosts corresponded mainly to the Firmicutes and Proteobacteria, and included Bacillus, Staphylococcus, Clostridium, Escherichia, Vibrio, Klebsiella, Pseudomonas and Yersinia. Predicted functional categories associated with bacteriophages showed a representation of structural, replication, integration and entry and lysis genes. The present study suggests that the natural mummification of the human gut results in the preservation of bacteriophage DNA, representing an opportunity to elucidate the ancient phageome and to hypothesize possible mechanisms of preservation. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T
1990-01-05
We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.
Generating intrinsically disordered protein conformational ensembles from a Markov chain
NASA Astrophysics Data System (ADS)
Cukier, Robert I.
2018-03-01
Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
Feral pig populations are structured at fine spatial scales in tropical Queensland, Australia.
Lopez, Jobina; Hurwood, David; Dryden, Bart; Fuller, Susan
2014-01-01
Feral pigs occur throughout tropical far north Queensland, Australia and are a significant threat to biodiversity and World Heritage values, agriculture and are a vector of infectious diseases. One of the constraints on long-lasting, local eradication of feral pigs is the process of reinvasion into recently controlled areas. This study examined the population genetic structure of feral pigs in far north Queensland to identify the extent of movement and the scale at which demographically independent management units exist. Genetic analysis of 328 feral pigs from the Innisfail to Tully region of tropical Queensland was undertaken. Seven microsatellite loci were screened and Bayesian clustering methods used to infer population clusters. Sequence variation at the mitochondrial DNA control region was examined to identify pig breed. Significant population structure was identified in the study area at a scale of 25 to 35 km, corresponding to three demographically independent management units (MUs). Distinct natural or anthropogenic barriers were not found, but environmental features such as topography and land use appear to influence patterns of gene flow. Despite the strong, overall pattern of structure, some feral pigs clearly exhibited ancestry from a MU outside of that from which they were sampled indicating isolated long distance dispersal or translocation events. Furthermore, our results suggest that gene flow is restricted among pigs of domestic Asian and European origin and non-random mating influences management unit boundaries. We conclude that the three MUs identified in this study should be considered as operational units for feral pig control in far north Queensland. Within a MU, coordinated and simultaneous control is required across farms, rainforest areas and National Park Estates to prevent recolonisation from adjacent localities.
Zhang, Yingzi; Hou, Yulong; Zhang, Yanjun; Hu, Yanjun; Zhang, Liang; Gao, Xiaolong; Zhang, Huixin; Liu, Wenyi
2018-04-16
A quasi-distributed liquid leakage (QDLL) sensor in local area is proposed and experimentally demonstrated, providing a real-time yet low-cost method than the existing local QDLL sensor. The sensor mainly consists of a flexible lamp belt (FLB) with light-emitting diodes (LEDs) and a polymer optical fiber (POF) processed with side-coupling structures. The side-coupling structures are illuminated by the LEDs one by one, forming a series of sensing probes. The lights are side-coupled into the POF through the side-coupling structure and pulse sequences are obtained from the power meters connected to the both ends of the POF. Each pulse represents a sensing probe, and the intensity of them increase when the coupling medium changes from air to liquid. The location of the leakage incident can be got by the position of each pulse in its output sequence. The influence of different side-coupling structures on side-coupling ratio are investigated. The experiment results validate the detection and localization abilities of the QDLL sensor along a 1 m-long POF with a spatial resolution of 0.1 m, which can be improved by adjusting the side-coupling structure. Furthermore, the temperature dependence is studied and can be compensated.
Automatic streak endpoint localization from the cornerness metric
NASA Astrophysics Data System (ADS)
Sease, Brad; Flewelling, Brien; Black, Jonathan
2017-05-01
Streaked point sources are a common occurrence when imaging unresolved space objects from both ground- and space-based platforms. Effective localization of streak endpoints is a key component of traditional techniques in space situational awareness related to orbit estimation and attitude determination. To further that goal, this paper derives a general detection and localization method for streak endpoints based on the cornerness metric. Corners detection involves searching an image for strong bi-directional gradients. These locations typically correspond to robust structural features in an image. In the case of unresolved imagery, regions with a high cornerness score correspond directly to the endpoints of streaks. This paper explores three approaches for global extraction of streak endpoints and applies them to an attitude and rate estimation routine.
Streamwise-Localized Solutions with natural 1-fold symmetry
NASA Astrophysics Data System (ADS)
Altmeyer, Sebastian; Willis, Ashley; Hof, Björn
2014-11-01
It has been proposed in recent years that turbulence is organized around unstable invariant solutions, which provide the building blocks of the chaotic dynamics. In direct numerical simulations of pipe flow we show that when imposing a minimal symmetry constraint (reflection in an axial plane only) the formation of turbulence can indeed be explained by dynamical systems concepts. The hypersurface separating laminar from turbulent motion, the edge of turbulence, is spanned by the stable manifolds of an exact invariant solution, a periodic orbit of a spatially localized structure. The turbulent states themselves (turbulent puffs in this case) are shown to arise in a bifurcation sequence from a related localized solution (the upper branch orbit). The rather complex bifurcation sequence involves secondary Hopf bifurcations, frequency locking and a period doubling cascade until eventually turbulent puffs arise. In addition we report preliminary results of the transition sequence for pipe flow without symmetry constraints.
2016-01-01
Abstract Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G‐LoSA. G‐LoSA aligns protein local structures in a sequence order independent way and provides a GA‐score, a chemical feature‐based and size‐independent structure similarity score. Our benchmark validation shows the robust performance of G‐LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure‐centric comparative biology studies. In particular, G‐LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G‐LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer‐aided drug design. We hope that G‐LoSA can be a useful computational method for exploring interesting biological problems through large‐scale comparison of protein local structures and facilitating drug discovery research and development. G‐LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. PMID:26813336
Ding, Jiarui; Condon, Anne; Shah, Sohrab P
2018-05-21
Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.
2010-01-01
Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Teichmann, A Lina; Nieuwenstein, Mark R; Rich, Anina N
2015-01-01
Digit-color synesthetes report experiencing colors when perceiving letters and digits. The conscious experience is typically unidirectional (e.g., digits elicit colors but not vice versa) but recent evidence shows subtle bidirectional effects. We examined whether short-term memory for colors could be affected by the order of presentation reflecting more or less structure in the associated digits. We presented a stream of colored squares and asked participants to report the colors in order. The colors matched each synesthete's colors for digits 1-9 and the order of the colors corresponded either to a sequence of numbers (e.g., [red, green, blue] if 1 = red, 2 = green, 3 = blue) or no systematic sequence. The results showed that synesthetes recalled sequential color sequences more accurately than pseudo-randomized colors, whereas no such effect was found for the non-synesthetic controls. Synesthetes did not differ from non-synesthetic controls in recall of color sequences overall, providing no evidence of a general advantage in memory for serial recall of colors.
Neuwald, Andrew F
2009-08-01
The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2.
Gotter, Anthony L; Shaikh, Tamim H; Budarf, Marcia L; Rhodes, C Harker; Emanuel, Beverly S
2004-01-01
Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem-loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem-loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure.
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2
Gotter, Anthony L.; Shaikh, Tamim H.; Budarf, Marcia L.; Rhodes, C. Harker; Emanuel, Beverly S.
2010-01-01
Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem–loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem–loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure. PMID:14613967
Galbany-Casals, M; Blanco-Moreno, J M; Garcia-Jacas, N; Breitwieser, I; Smissen, R D
2011-07-01
The yellow-flowered everlasting daisy Helichrysum italicum (Asteraceae, Gnaphalieae) is widely distributed in the Mediterranean basin, where it grows in continuous and widespread populations in diverse open habitats. Helichrysum italicum subsp. microphyllum has a disjunct distribution in the Balearic Islands (Majorca and Dragonera), Corsica, Sardinia, Crete and Cyprus. Numerous morphological intermediates between subsp. italicum and subsp. microphyllum are known from Corsica, where the two subspecies co-occur. The aims of the study were to investigate if subsp. microphyllum has a common origin, constituting an independent gene pool from subsp. italicum, or if the morphological differences between subsp. microphyllum and subsp. italicum have arisen independently in different locations from a common wider gene pool. Our analyses of AFLP, cpDNA sequences and morphological characters show that there is geographic structure to the genetic variation within H. italicum, with eastern and western Mediterranean groups, which do not correspond with the division into subsp. microphyllum and subsp. italicum as currently circumscribed. Local selection on quantitative trait loci provides sufficient explanation for the morphological divergence observed and is consistent with genetic data. Within the western Mediterranean group of the species we found considerable polymorphism in chloroplast DNA sequences among and within some populations. Comparison with chloroplast DNA sequences from other Helichrysum species showed that some chloroplast haplotypes are shared across species. © 2010 German Botanical Society and The Royal Botanical Society of the Netherlands.
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server
Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles
2015-01-01
The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960
Controlled simulation of optical turbulence in a temperature gradient air chamber
NASA Astrophysics Data System (ADS)
Toselli, Italo; Wang, Fei; Korotkova, Olga
2016-05-01
Atmospheric turbulence simulator is built and characterized for in-lab optical wave propagation with controlled strength of the refractive-index fluctuations. The temperature gradients are generated by a sequence of heat guns with controlled individual strengths. The temperature structure functions are measured in two directions transverse to propagation path with the help of a thermocouple array and used for evaluation of the corresponding refractive-index structure functions of optical turbulence.
JPRS Report, Science and Technology USSR: Life Sciences.
1990-07-16
4 1 VETERINARY MEDICINE Primary Structure of RNA Polymerase Gene of Foot-and-Mouth Disease Virus ( FMDV ...neering were used to obtain cDNA corresponding to the Primary Structure of RNA Polymerase Gene of RNA polymerase gene to FMDV A 2 2 , with a map of the...Foot-and-Mouth Disease Virus ( FMDV ) A22 primary nucleotide sequence of the cDNA provided. 18400538F Moscow BIOORGANICHESKA YA Analysis of the data
Bedrock geologic and structural map through the western Candor Colles region of Mars
Okubo, Chris H.
2014-01-01
The structure and geology of the layered deposits in the Candor Colles region corresponding to units Avfs, Avme, and Hvl of Witbeck and others (1991) are reevaluated in this 1:18,000-scale map. The objectives herein are to gather high-resolution structural measurements to (1) refine the previous unit boundaries in this area established by Witbeck and others (1991), (2) revise the local stratigraphy where necessary, (3) characterize bed forms to help constrain depositional processes, and (4) determine the styles and extent of deformation to better inform reconstructions of the local post-depositional geologic history.
Reconciling Apparent Conflicts between Mitochondrial and Nuclear Phylogenies in African Elephants
Georgiadis, Nicholas J.; David, Victor A.; Zhao, Kai; Stephens, Robert M.; Kolokotronis, Sergios-Orestis; Roca, Alfred L.
2011-01-01
Conservation strategies for African elephants would be advanced by resolution of conflicting claims that they comprise one, two, three or four taxonomic groups, and by development of genetic markers that establish more incisively the provenance of confiscated ivory. We addressed these related issues by genotyping 555 elephants from across Africa with microsatellite markers, developing a method to identify those loci most effective at geographic assignment of elephants (or their ivory), and conducting novel analyses of continent-wide datasets of mitochondrial DNA. Results showed that nuclear genetic diversity was partitioned into two clusters, corresponding to African forest elephants (99.5% Cluster-1) and African savanna elephants (99.4% Cluster-2). Hybrid individuals were rare. In a comparison of basal forest “F” and savanna “S” mtDNA clade distributions to nuclear DNA partitions, forest elephant nuclear genotypes occurred only in populations in which S clade mtDNA was absent, suggesting that nuclear partitioning corresponds to the presence or absence of S clade mtDNA. We reanalyzed African elephant mtDNA sequences from 81 locales spanning the continent and discovered that S clade mtDNA was completely absent among elephants at all 30 sampled tropical forest locales. The distribution of savanna nuclear DNA and S clade mtDNA corresponded closely to range boundaries traditionally ascribed to the savanna elephant species based on habitat and morphology. Further, a reanalysis of nuclear genetic assignment results suggested that West African elephants do not comprise a distinct third species. Finally, we show that some DNA markers will be more useful than others for determining the geographic origins of illegal ivory. These findings resolve the apparent incongruence between mtDNA and nuclear genetic patterns that has confounded the taxonomy of African elephants, affirm the limitations of using mtDNA patterns to infer elephant systematics or population structure, and strongly support the existence of two elephant species in Africa. PMID:21701575
Genes expressed during the development and ripening of watermelon fruit.
Levi, A; Davis, A; Hernandez, A; Wechter, P; Thimmapuram, J; Trebitsh, T; Tadmor, Y; Katzir, N; Portnoy, V; King, S
2006-11-01
A normalized cDNA library was constructed using watermelon flesh mRNA from three distinct developmental time-points and was subtracted by hybridization with leaf cDNA. Random cDNA clones of the watermelon flesh subtraction library were sequenced from the 5' end in order to identify potentially informative genes associated with fruit setting, development, and ripening. One-thousand and forty-six 5'-end sequences (expressed sequence tags; ESTs) were assembled into 832 non-redundant sequences, designated as "EST-unigenes". Of these 832 "EST-unigenes", 254 ( approximately 30%) have no significant homology to sequences published so far for other plant species. Additionally, 168 "EST-unigenes" ( approximately 20%) correspond to genes with unknown function, whereas 410 "EST-unigenes" ( approximately 50%) correspond to genes with known function in other plant species. These "EST-unigenes" are mainly associated with metabolism, membrane transport, cytoskeleton synthesis and structure, cell wall formation and cell division, signal transduction, nucleic acid binding and transcription factors, defense and stress response, and secondary metabolism. This study provides the scientific community with novel genetic information for watermelon as well as an expanded pool of genes associated with fruit development in watermelon. These genes will be useful targets in future genetic and functional genomic studies of watermelon and its development.
Chi, Sylvia Ighem; Urbarova, Ilona; Johansen, Steinar D
2018-04-30
The mitochondrial genomes of sea anemones are dynamic in structure. Invasion by genetic elements, such as self-catalytic group I introns or insertion-like sequences, contribute to sea anemone mitochondrial genome expansion and complexity. By using next generation sequencing we investigated the complete mtDNAs and corresponding transcriptomes of the temperate sea anemone Anemonia viridis and its closer tropical relative Anemonia majano. Two versions of fused homing endonuclease gene (HEG) organization were observed among the Actiniidae sea anemones; in-frame gene fusion and pseudo-gene fusion. We provided support for the pseudo-gene fusion organization in Anemonia species, resulting in a repressed HEG from the COI-884 group I intron. orfA, a putative protein-coding gene with insertion-like features, was present in both Anemonia species. Interestingly, orfA and COI expression were significantly up-regulated upon long-term environmental stress corresponding to low seawater pH conditions. This study provides new insights to the dynamics of sea anemone mitochondrial genome structure and function. Copyright © 2018 Elsevier B.V. All rights reserved.
Recognition of coarse-grained protein tertiary structure.
Lezon, Timothy; Banavar, Jayanth R; Maritan, Amos
2004-05-15
A model of the protein backbone is considered in which each residue is characterized by the location of its C(alpha) atom and one of a discrete set of conformal (phi, psi) states. We investigate the key differences between a description that offers a locally precise fit to known backbone structures and one that provides a globally accurate fit to protein structures. Using a statistical scoring scheme and threading, a protein's local best-fit conformation is highly recognizable, but its global structure cannot be directly determined from an amino acid sequence. The incorporation of information about the conformal states of neighboring residues along the chain allows one to accurately translate the local structure into a global structure. We present a two-step algorithm, which recognizes up to 95% of the tested protein native-state structures to within a 2.5 A root mean square deviation. Copyright 2004 Wiley-Liss, Inc.
Regional surnames and genetic structure in Great Britain.
Kandt, Jens; Cheshire, James A; Longley, Paul A
2016-10-01
Following the increasing availability of DNA-sequenced data, the genetic structure of populations can now be inferred and studied in unprecedented detail. Across social science, this innovation is shaping new bio-social research agendas, attracting substantial investment in the collection of genetic, biological and social data for large population samples. Yet genetic samples are special because the precise populations that they represent are uncertain and ill-defined. Unlike most social surveys, a genetic sample's representativeness of the population cannot be established by conventional procedures of statistical inference, and the implications for population-wide generalisations about bio-social phenomena are little understood. In this paper, we seek to address these problems by linking surname data to a censored and geographically uneven sample of DNA scans, collected for the People of the British Isles study. Based on a combination of global and local spatial correspondence measures, we identify eight regions in Great Britain that are most likely to represent the geography of genetic structure of Great Britain's long-settled population. We discuss the implications of this regionalisation for bio-social investigations. We conclude that, as the often highly selective collection of DNA and biomarkers becomes a more common practice, geography is crucial to understanding variation in genetic information within diverse populations.
Razo-Mendivil, Ulises; Vázquez-Domínguez, Ella; de León, Gerardo Pérez-Ponce
2013-12-01
Genetic analyses of hosts and their parasites are key to understand the evolutionary patterns and processes that have shaped host-parasite associations. We evaluated the genetic structure of the digenean Crassicutis cichlasomae and its most common host, the Mayan cichlid "Cichlasoma" urophthalmus, encompassing most of their geographical range in Middle-America (river basins in southeastern Mexico, Belize, and Guatemala together with the Yucatan Peninsula). Genetic diversity and structure analyses were done based on 167 cytochrome c oxidase subunit 1 sequences (330 bp) for C. cichlasomae from 21 populations and 161 cytochrome b sequences (599 bp) for "C." urophthalmus from 26 populations. Analyses performed included phylogenetic tree estimation under Bayesian inference and maximum likelihood analysis, genetic diversity, distance and structure estimates, haplotype networks, and demographic evaluations. Crassicutis cichlasomae showed high genetic diversity values and genetic structuring, corresponding with 4 groups clearly differentiated and highly divergent. Conversely, "C." urophthalmus showed low levels of genetic diversity and genetic differentiation, defined as 2 groups with low divergence and with no correspondence with geographical distribution. Our results show that species of cichlids parasitized by C. cichlasomae other than "C." urophthalmus, along with multiple colonization events and subsequent isolation in different basins, are likely factors that shaped the genetic structure of the parasite. Meanwhile, historical long-distance dispersal and drought periods during the Holocene, with significant population size reductions and fragmentations, are factors that could have shaped the genetic structure of the Mayan cichlid.
Rapid divergence and expansion of the X chromosome in papaya
Gschwend, Andrea R.; Yu, Qingyi; Tong, Eric J.; Zeng, Fanchang; Han, Jennifer; VanBuren, Robert; Aryal, Rishi; Charlesworth, Deborah; Moore, Paul H.; Paterson, Andrew H.; Ming, Ray
2012-01-01
X chromosomes have long been thought to conserve the structure and gene content of the ancestral autosome from which the sex chromosomes evolved. We compared the recently evolved papaya sex chromosomes with a homologous autosome of a close relative, the monoecious Vasconcellea monoica, to infer changes since recombination stopped between the papaya sex chromosomes. We sequenced 12 V. monoica bacterial artificial chromosomes, 11 corresponding to the papaya X-specific region, and 1 to a papaya autosomal region. The combined V. monoica X-orthologous sequences are much shorter (1.10 Mb) than the corresponding papaya region (2.56 Mb). Given that the V. monoica genome is 41% larger than that of papaya, this finding suggests considerable expansion of the papaya X; expansion is supported by a higher repetitive sequence content of the X compared with the papaya autosomal sequence. The alignable regions include 27 transcript-encoding sequences, only 6 of which are functional X/V. monoica gene pairs. Sequence divergence from the V. monoica orthologs is almost identical for papaya X and Y alleles; the Carica-Vasconcellea split therefore occurred before the papaya sex chromosomes stopped recombining, making V. monoica a suitable outgroup for inferring changes in papaya sex chromosomes. The papaya X and the hermaphrodite-specific region of the Yh chromosome and V. monoica have all gained and lost genes, including a surprising amount of changes in the X. PMID:22869742
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petrillo-Peixoto, M.L.; Beverley, S.M.
1988-12-01
We describe the structure of amplified DNA that was discovered in two laboratory stocks of the protozoan parasite Leishmania tarentolae. Restriction mapping and molecular cloning revealed that a region of 42 kilobases was amplified 8- to 30-fold in these lines. Southern blot analyses of digested DNAs or chromosomes separated by pulsed-field electrophoresis showed that the amplified DNA corresponded to the H region, a locus defined originally by its amplification in methotrexate-resistant Leishmania major. Similarities between the amplified DNA of the two species included (i) extensive cross-hybridization; (ii) approximate conservation of sequence order; (iii) extrachromosomal localization; (iv) an overall inverted, head-to-headmore » configuration as a circular 140-kilobase tetrameric molecule; (v) two regions of DNA sequence rearrangement, each of which was closely associated with the two centers of the inverted repeats; (vi) association with methotrexate resistance; and (vii) phenotypically conservative amplification, in which the wild-type chromosomal arrangement was retained without apparent modification. Our data showed that amplified DNA mediating drug resistance arose in unselected L. tarentolae, although the pressures leading to apparently spontaneous amplification and maintenance of the H region are not known. The simple structure and limited extent of DNA amplified in these and other Leishmania lines suggests that the study of gene amplification in Leishmania spp. offers an attractive model system for the study of amplification in cultured mammalian cells and tumors. We also introduced a method for measuring the size of large circular DNAs, using gamma-irradiation to introduce limited double-strand breaks followed by sizing of the linear DNAs by pulsed-field electrophoresis.« less
Bao, Yunhe; White, Cindy L; Luger, Karolin
2006-08-25
Poly(dA.dT) DNA sequence elements are thought to promote transcription by either excluding nucleosomes or by altering their structural or dynamic properties. Here, the stability and structure of a defined nucleosome core particle containing a 16 base-pair poly(dA.dT) element (A16 NCP) was investigated. The A16 NCP requires a significantly higher temperature for histone octamer sliding in vitro compared to comparable nucleosomes that do not contain a poly(dA.dT) element. Fluorescence resonance energy transfer showed that the interactions between the nucleosomal DNA ends and the histone octamer were destabilized in A16 NCP. The crystal structure of A16 NCP was determined to a resolution of 3.2 A. The overall structure was maintained except for local deviations in DNA conformation. These results are consistent with previous in vivo and in vitro observations that poly(dA.dT) elements cause only modest changes in DNA accessibility and modest increases in steady-state transcription levels.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dussossoy, D.; Carayon, P.; Feraut, D.
1996-05-01
Based on the amino acid sequence deduced from the cloned human peripheral benzodiazepine receptor (PBR) gene, monoclonal antibody (Mab 8D7) was produced against the C-terminal fragment of the receptor. Immunoblot experiments, performed against purified PBR, indicated that the antipeptide antibody recognized, under denaturing conditions, the corresponding amino acid sequence of the PBR. When mitochondrial membranes form PBR transfected yeast or from THP1 and U937 cells were used on immunoblot analysis, a high level of immunoreactivity was observed at 18 kDa, the PBR molecular mass deduced from cDNA, establishing the specificity of the antibody for the receptor. Moreover, binding experiments realizedmore » with intact mitochondria demonstrated that the immunogenic sequence was accessible to the antibody indicating that the C-terminal fragment of the PBR faces the cytosol. Using this Mab we developed a technique which allowed precise quantification of PBR density per cell. Furthermore, cellular localization studies by flow cytometric analysis and confocal microscopy on cell lines displaying different levels of PBR showed that Mab 8D7 was entirely colocalized with an antimitochondria Mab. 34 refs., 7 figs.« less
NoFold: RNA structure clustering without folding or alignment.
Middleton, Sarah A; Kim, Junhyong
2014-11-01
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System
Smith, L. Courtney
2012-01-01
The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951
Evidence for a Complex Class of Nonadenylated mRNA in Drosophila
Zimmerman, J. Lynn; Fouts, David L.; Manning, Jerry E.
1980-01-01
The amount, by mass, of poly(A+) mRNA present in the polyribosomes of third-instar larvae of Drosophila melanogaster, and the relative contribution of the poly(A+) mRNA to the sequence complexity of total polysomal RNA, has been determined. Selective removal of poly(A+) mRNA from total polysomal RNA by use of either oligo-dT-cellulose, or poly(U)-sepharose affinity chromatography, revealed that only 0.15% of the mass of the polysomal RNA was present as poly(A+) mRNA. The present study shows that this RNA hybridized at saturation with 3.3% of the single-copy DNA in the Drosophila genome. After correction for asymmetric transcription and reactability of the DNA, 7.4% of the single-copy DNA in the Drosophila genome is represented in larval poly(A+) mRNA. This corresponds to 6.73 x 106 nucleotides of mRNA coding sequences, or approximately 5,384 diverse RNA sequences of average size 1,250 nucleotides. However, total polysomal RNA hybridizes at saturation to 10.9% of the single-copy DNA sequences. After correcting this value for asymmetric transcription and tracer DNA reactability, 24% of the single-copy DNA in Drosophila is represented in total polysomal RNA. This corresponds to 2.18 x 107 nucleotides of RNA coding sequences or 17,440 diverse RNA molecules of size 1,250 nucleotides. This value is 3.2 times greater than that observed for poly(A+) mRNA, and indicates that ≃69% of the polysomal RNA sequence complexity is contributed by nonadenylated RNA. Furthermore, if the number of different structural genes represented in total polysomal RNA is ≃1.7 x 104, then the number of genes expressed in third-instar larvae exceeds the number of chromomeres in Drosophila by about a factor of three. This numerology indicates that the number of chromomeres observed in polytene chromosomes does not reflect the number of structural gene sequences in the Drosophila genome. PMID:6777246
INFO-RNA--a fast approach to inverse RNA folding.
Busch, Anke; Backofen, Rolf
2006-08-01
The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. www.bioinf.uni-freiburg.de?Subpages/software.html.
Rtools: a web server for various secondary structural analyses on single RNA sequences.
Hamada, Michiaki; Ono, Yukiteru; Kiryu, Hisanori; Sato, Kengo; Kato, Yuki; Fukunaga, Tsukasa; Mori, Ryota; Asai, Kiyoshi
2016-07-08
The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Swaney, Danielle L.; Wenger, Craig D.; Thomson, James A.; Coon, Joshua J.
2009-01-01
Protein phosphorylation is central to the understanding of cellular signaling, and cellular signaling is suggested to play a major role in the regulation of human embryonic stem (ES) cell pluripotency. Here, we describe the use of conventional tandem mass spectrometry-based sequencing technology—collision-activated dissociation (CAD)—and the more recently developed method electron transfer dissociation (ETD) to characterize the human ES cell phosphoproteome. In total, these experiments resulted in the identification of 11,995 unique phosphopeptides, corresponding to 10,844 nonredundant phosphorylation sites, at a 1% false discovery rate (FDR). Among these phosphorylation sites are 5 localized to 2 pluripotency critical transcription factors—OCT4 and SOX2. From these experiments, we conclude that ETD identifies a larger number of unique phosphopeptides than CAD (8,087 to 3,868), more frequently localizes the phosphorylation site to a specific residue (49.8% compared with 29.6%), and sequences whole classes of phosphopeptides previously unobserved. PMID:19144917
Spielmann, A; Stutz, E
1983-10-25
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2.
Nakazawa, Yasumoto; Asakura, Tetsuo
2003-06-18
Fibrous proteins unlike globular proteins, contain repetitive amino acid sequences, giving rise to very regular secondary protein structures. Silk fibroin from a wild silkworm, Samia cynthia ricini, consists of about 100 repeats of alternating polyalanine (poly-Ala) regions of 12-13 residues in length and Gly-rich regions. In this paper, the precise structure of the model peptide, GGAGGGYGGDGG(A)(12)GGAGDGYGAG, which is a typical repeated sequence of the silk fibroin, was determined using a combination of three kinds of solid-state NMR studies; a quantitative use of (13)C CP/MAS NMR chemical shift with conformation-dependent (13)C chemical shift contour plots, 2D spin diffusion (13)C solid-state NMR under off magic angle spinning and rotational echo double resonance. The structure of the model peptide corresponding to the silk fibroin structure before spinning was determined. The torsion angles of the central Ala residue, Ala(19), in the poly-Ala region were determined to be (phi, psi) = (-59 degrees, -48 degrees ) which are values typically associated with alpha-helical structures. However, the torsion angles of the Gly(25) residue adjacent to the C-terminal side of the poly-Ala chain were determined to be (phi, psi) = (-66 degrees, -22 degrees ) and those of Gly(12) and Ala(13) residues at the N-terminal of the poly-Ala chain to be (phi, psi) = (-70 degrees, -30 degrees ). In addition, REDOR experiments indicate that the torsion angles of the two C-terminal Ala residues, Ala(23) and Ala(24), are (phi, psi) = (-66 degrees, -22 degrees ) and those of N-terminal two Ala residues, Ala(13) and Ala(14) are (phi, psi) = (-70 degrees, -30 degrees ). Thus, the local structure of N-terminal and C-terminal residues, and also the neighboring residues of alpha-helical poly-Ala chain in the model peptide is a more strongly wound structure than found in typical alpha-helix structures.
Geologic Map of the Gold Creek Gold District, Elko County, Nevada
Ketner, Keith B.
2007-01-01
The Gold Creek, Nev. area displays important stratigraphic and structural relationships between Paleozoic and early Tertiary sedimentary strata in an area dominated by large intrusive bodies of Mesozoic age and extensive volcanic fields of middle to late Tertiary age. An autochthonous sequence includes the Cambrian and Proterozoic(?) Prospect Mountain Quartzite and the overlying Cambrian and Ordovician Tennessee Mountain Formation. This autochthon is overlain by three allochthonous plates each composed of a distinctive sequence of strata and having a distinctive internal structure. The structurally lowest plate is composed of the Havallah sequence, locally of Mississippian and Pennsylvanian age, which is folded on north-south trending axes. The next higher plate is composed of somewhat younger Pennsylvanian and Permian strata cut by east-west trending low-angle faults. The highest plate is composed of early Tertiary non-marine sedimentary and igneous rocks folded on varied but mainly north-south trending axes. The question of whether the allochthonous plates were emplaced by contractional or extensional forces is indeterminate from the local evidence. Mineral deposits include gold placers of moderate size and small pockets of base metals, none of which is currently being exploited.
Localized structural frustration for evaluating the impact of sequence variants
Kumar, Sushant; Clarke, Declan; Gerstein, Mark
2016-01-01
Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. PMID:27915290
Structural rejuvenation in bulk metallic glasses
Tong, Yang; Iwashita, T.; Dmowski, Wojciech; ...
2015-01-05
Using high-energy X-ray diffraction we study structural changes in bulk metallic glasses after uniaxial compressive homogeneous deformation at temperatures slightly below the glass transition. We observe that deformation results in structural disordering corresponding to an increase in the fictive, or effective, temperature. However, the structural disordering saturates after yielding. Lastly, examination of the experimental structure and molecular dynamics simulation suggests that local changes in the atomic connectivity network are the main driving force of the structural rejuvenation.
Structural rejuvenation in bulk metallic glasses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tong, Yang; Iwashita, T.; Dmowski, Wojciech
Using high-energy X-ray diffraction we study structural changes in bulk metallic glasses after uniaxial compressive homogeneous deformation at temperatures slightly below the glass transition. We observe that deformation results in structural disordering corresponding to an increase in the fictive, or effective, temperature. However, the structural disordering saturates after yielding. Lastly, examination of the experimental structure and molecular dynamics simulation suggests that local changes in the atomic connectivity network are the main driving force of the structural rejuvenation.
Kuwahara, Tomomi; Yamashita, Atsushi; Hirakawa, Hideki; Nakayama, Haruyuki; Toh, Hidehiro; Okada, Natsumi; Kuhara, Satoru; Hattori, Masahira; Hayashi, Tetsuya; Ohnishi, Yoshinari
2004-01-01
Bacteroides are predominant human colonic commensals, but the principal pathogenic species, Bacteroides fragilis (BF), lives closely associated with the mucosal surface, whereas a second major species, Bacteroides thetaiotaomicron (BT), concentrates within the colon. We find corresponding differences in their genomes, based on determination of the genome sequence of BF and comparative analysis with BT. Both species have acquired two mechanisms that contribute to their dominance among the colonic microbiota: an exceptional capability to use a wide range of dietary polysaccharides by gene amplification and the capacity to create variable surface antigenicities by multiple DNA inversion systems. However, the gene amplification for polysaccharide assimilation is more developed in BT, in keeping with its internal localization. In contrast, external antigenic structures can be changed more systematically in BF. Thereby, at the mucosal surface, where microbes encounter continuous attack by host defenses, BF evasion of the immune system is favored, and its colonization and infectious potential are increased. PMID:15466707
Nagesh, Narayana; Krishnaiah, Abburi
2003-07-31
DNA from the telomeres contains a stretch of simple tandemly repeated sequences in which clusters of G residues alternate with clusters of T/A sequences along one DNA strand. Model telomeric G-clusters form four-stranded structures in presence of Na(I), K(I) and NH(4)(I) ions. Electrophoretic and spectroscopic studies were made with the telomeric related sequences d(T6G16) or d(G4T2G4T2G4T2G4). It was noticed earlier that G-quadruplex may either be inter-molecular, or intra-molecular, or a mixture of both. CD spectral characteristics of various G-quadruplex DNA suggests that the CD maximum at 293 nm corresponds to that of an intra-molecular G-quadruplex structure or hairpin dimers. Fluorescence titration studies also show that acridine and the bis-acridine are interacting with G-quadruplex DNA and destabilize the K(I)-quadruplex structure more efficiently than the quadruplex formed by NH(4)(I) ion. Among the two drugs studied, acridine is more capable of breaking the G-quadruplex structure than bis-acridine. This result is further confirmed by the CD experiments.
Mathews, D H; Banerjee, A R; Luan, D D; Eickbush, T H; Turner, D H
1997-01-01
RNA transcripts corresponding to the 250-nt 3' untranslated region of the R2 non-LTR retrotransposable element are recognized by the R2 reverse transcriptase and are sufficient to serve as templates in the target DNA-primed reverse transcription (TPRT) reaction. The R2 protein encoded by the Bombyx mori R2 can recognize this region from both the B. mori and Drosophila melanogaster R2 elements even though these regions show little nucleotide sequence identity. A model for the RNA secondary structure of the 3' untranslated region of the D. melanogaster R2 retrotransposon was developed by sequence comparison of 10 species aided by free energy minimization. Chemical modification experiments are consistent with this prediction. A secondary structure model for the 3' untranslated region of R2 RNA from the R2 element from B. mori was obtained by a combination of chemical modification data and free energy minimization. These two secondary structure models, found independently, share several common sites. This study shows the utility of combining free energy minimization, sequence comparison, and chemical modification to model an RNA secondary structure. PMID:8990394
Tracking polypeptide folds on the free energy surface: effects of the chain length and sequence.
Brukhno, Andrey V; Ricchiuto, Piero; Auer, Stefan
2012-07-26
Characterization of the folding transition in polypeptides and assessing the thermodynamic stability of their structured folds are of primary importance for approaching the problem of protein folding. We use molecular dynamics simulations for a coarse grained polypeptide model in order to (1) obtain the equilibrium conformation diagram of homopolypeptides in a broad range of the chain lengths, N = 10, ..., 100, and temperatures, T (in a multicanonical ensemble), and (2) determine free energy profiles (FEPs) projected onto an optimal, so-called "natural", reaction coordinate that preserves the height of barriers and the diffusion coefficients on the underlying free energy hyper-surface. We then address the following fundamental questions. (i) How well does a kinetically determined free energy landscape of a single chain represent the polypeptide equilibrium (ensemble) behavior? In particular, under which conditions might the correspondence be lost, and what are the possible implications for the folding processes? (ii) How does the free energy landscape depend on the chain length (homopolypeptides) and the monomer interaction sequence (heteropolypeptides)? Our data reveal that at low T values equilibrium structures adopted by relatively short homopolypeptides (N < 60) are dominated by α-helical folds which correspond to the primary and secondary minima of the FEP. In contrast, longer homopolypeptides (N > 70), upon quasi-equilibrium cooling, fold preferentially in β-bundles with small helical portions, while the FEPs exhibit no distinct global minima. Moreover, subject to the choice of the initial configuration, at sufficiently low T, essentially metastable structures can be found and prevail far from the true thermodynamic equilibrium. We also show that, by sequence-enabling the polypeptide model, it is possible to restrict the chain to a very specific part of the configuration space, which results in substantial simplification and smoothing of the free energy landscape as compared to the case of the corresponding homopolypeptide.
Automatic Matching of Large Scale Images and Terrestrial LIDAR Based on App Synergy of Mobile Phone
NASA Astrophysics Data System (ADS)
Xia, G.; Hu, C.
2018-04-01
The digitalization of Cultural Heritage based on ground laser scanning technology has been widely applied. High-precision scanning and high-resolution photography of cultural relics are the main methods of data acquisition. The reconstruction with the complete point cloud and high-resolution image requires the matching of image and point cloud, the acquisition of the homonym feature points, the data registration, etc. However, the one-to-one correspondence between image and corresponding point cloud depends on inefficient manual search. The effective classify and management of a large number of image and the matching of large image and corresponding point cloud will be the focus of the research. In this paper, we propose automatic matching of large scale images and terrestrial LiDAR based on APP synergy of mobile phone. Firstly, we develop an APP based on Android, take pictures and record related information of classification. Secondly, all the images are automatically grouped with the recorded information. Thirdly, the matching algorithm is used to match the global and local image. According to the one-to-one correspondence between the global image and the point cloud reflection intensity image, the automatic matching of the image and its corresponding laser radar point cloud is realized. Finally, the mapping relationship between global image, local image and intensity image is established according to homonym feature point. So we can establish the data structure of the global image, the local image in the global image, the local image corresponding point cloud, and carry on the visualization management and query of image.
Centromeric chromatin and its dynamics in plants.
Lermontova, Inna; Sandmann, Michael; Mascher, Martin; Schmit, Anne-Catherine; Chabouté, Marie-Edith
2015-07-01
Centromeres are chromatin structures that are required for proper separation of chromosomes during mitosis and meiosis. The centromere is composed of centromeric DNA, often enriched in satellite repeats, and kinetochore complex proteins. To date, over 100 kinetochore components have been identified in various eukaryotes. Kinetochore assembly begins with incorporation of centromeric histone H3 variant CENH3 into centromeric nucleosomes. Protein components of the kinetochore are either present at centromeres throughout the cell cycle or localize to centromeres transiently, prior to attachment of microtubules to each kinetochore in prometaphase of mitotic cells. This is the case for the spindle assembly checkpoint (SAC) proteins in animal cells. The SAC complex ensures equal separation of chromosomes between daughter nuclei by preventing anaphase onset before metaphase is complete, i.e. the sister kinetochores of all chromosomes are attached to spindle fibers from opposite poles. In this review, we focus on the organization of centromeric DNA and the kinetochore assembly in plants. We summarize recent advances regarding loading of CENH3 into the centromere, and the subcellular localization and protein-protein interactions of Arabidopsis thaliana proteins involved in kinetochore assembly and function. We describe the transcriptional activity of corresponding genes based on in silico analysis of their promoters and cell cycle-dependent expression. Additionally, barley homologs of all selected A. thaliana proteins have been identified in silico, and their sequences and domain structures are presented. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
de Borba, Luana; Villordo, Sergio M; Iglesias, Nestor G; Filomatori, Claudia V; Gebhard, Leopoldo G; Gamarnik, Andrea V
2015-03-01
The dengue virus genome is a dynamic molecule that adopts different conformations in the infected cell. Here, using RNA folding predictions, chemical probing analysis, RNA binding assays, and functional studies, we identified new cis-acting elements present in the capsid coding sequence that facilitate cyclization of the viral RNA by hybridization with a sequence involved in a local dumbbell structure at the viral 3' untranslated region (UTR). The identified interaction differentially enhances viral replication in mosquito and mammalian cells. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Combining stress transfer and source directivity: the case of the 2012 Emilia seismic sequence
Convertito, Vincenzo; Catalli, Flaminia; Emolo, Antonio
2013-01-01
The Emilia seismic sequence (Northern Italy) started on May 2012 and caused 17 casualties, severe damage to dwellings and forced the closure of several factories. The total number of events recorded in one month was about 2100, with local magnitude ranging between 1.0 and 5.9. We investigate potential mechanisms (static and dynamic triggering) that may describe the evolution of the sequence. We consider rupture directivity in the dynamic strain field and observe that, for each main earthquake, its aftershocks and the subsequent large event occurred in an area characterized by higher dynamic strains and corresponding to the dominant rupture direction. We find that static stress redistribution alone is not capable of explaining the locations of subsequent events. We conclude that dynamic triggering played a significant role in driving the sequence. This triggering was also associated with a variation in permeability and a pore pressure increase in an area characterized by a massive presence of fluids. PMID:24177982
Proshek, Benjamin; Dupuis, Julian R; Engberg, Anna; Davenport, Ken; Opler, Paul A; Powell, Jerry A; Sperling, Felix A H
2015-04-25
The Mormon Metalmark (Apodemia mormo) species complex occurs as isolated and phenotypically variable colonies in dryland areas across western North America. Lange's Metalmark, A. m. langei, one of the 17 subspecies taxonomically recognized in the complex, is federally listed under the U.S. Endangered Species Act of 1973. Metalmark taxa have traditionally been described based on phenotypic and ecological characteristics, and it is unknown how well this nomenclature reflects their genetic and evolutionary distinctiveness. Genetic variation in six microsatellite loci and mitochondrial cytochrome oxidase subunit I sequence was used to assess the population structure of the A. mormo species complex across 69 localities, and to evaluate A. m. langei's qualifications as an Evolutionarily Significant Unit. We discovered substantial genetic divergence within the species complex, especially across the Continental Divide, with population genetic structure corresponding more closely with geographic proximity and local isolation than with taxonomic divisions originally based on wing color and pattern characters. Lange's Metalmark was as genetically divergent as several other locally isolated populations in California, and even the unique phenotype that warranted subspecific and conservation status is reminiscent of the morphological variation found in some other populations. This study is the first genetic treatment of the A. mormo complex across western North America and potentially provides a foundation for reassessing the taxonomy of the group. Furthermore, these results illustrate the utility of molecular markers to aid in demarcation of biological units below the species level. From a conservation point of view, Apodemia mormo langei's diagnostic taxonomic characteristics may, by themselves, not support its evolutionary significance, which has implications for its formal listing as an Endangered Species.
Hager, Kevin W.; Fullerton, Heather; Butterfield, David A.; Moyer, Craig L.
2017-01-01
The Mariana region exhibits a rich array of hydrothermal venting conditions in a complex geological setting, which provides a natural laboratory to study the influence of local environmental conditions on microbial community structure as well as large-scale patterns in microbial biogeography. We used high-throughput amplicon sequencing of the bacterial small subunit (SSU) rRNA gene from 22 microbial mats collected from four hydrothermally active locations along the Mariana Arc and back-arc to explore the structure of lithotrophically-based microbial mat communities. The vent effluent was classified as iron- or sulfur-rich corresponding with two distinct community types, dominated by either Zetaproteobacteria or Epsilonproteobacteria, respectively. The Zetaproteobacterial-based communities had the highest richness and diversity, which supports the hypothesis that Zetaproteobacteria function as ecosystem engineers creating a physical habitat within a chemical environment promoting enhanced microbial diversity. Gammaproteobacteria were also high in abundance within the iron-dominated mats and some likely contribute to primary production. In addition, we also compare sampling scale, showing that bulk sampling of microbial mats yields higher diversity than micro-scale sampling. We present a comprehensive analysis and offer new insights into the community structure and diversity of lithotrophically-driven microbial mats from a hydrothermal region associated with high microbial biodiversity. Our study indicates an important functional role of for the Zetaproteobacteria altering the mat habitat and enhancing community interactions and complexity. PMID:28970817
Functional Activity of the Fanconi Anemia Protein FAA Requires FAC Binding and Nuclear Localization
Näf, Dieter; Kupfer, Gary M.; Suliman, Ahmed; Lambert, Kathleen; D’Andrea, Alan D.
1998-01-01
Fanconi anemia (FA) is an autosomal recessive disease characterized by genomic instability, cancer susceptibility, and cellular hypersensitivity to DNA-cross-linking agents. Eight complementation groups of FA (FA-A through FA-H) have been identified. Two FA genes, corresponding to complementation groups FA-A and FA-C, have been cloned, but the functions of the encoded FAA and FAC proteins remain unknown. We have recently demonstrated that FAA and FAC interact to form a nuclear complex. In this study, we have analyzed a series of mutant forms of the FAA protein with respect to functional activity, FAC binding, and nuclear localization. Mutation or deletion of the amino-terminal nuclear localization signal (NLS) of FAA results in loss of functional activity, loss of FAC binding, and cytoplasmic retention of FAA. Replacement of the NLS sequence with a heterologous NLS sequence, derived from the simian virus 40 T antigen, results in nuclear localization but does not rescue functional activity or FAC binding. Nuclear localization of the FAA protein is therefore necessary but not sufficient for FAA function. Mutant forms of FAA which fail to bind to FAC also fail to promote the nuclear accumulation of FAC. In addition, wild-type FAC promotes the accumulation of wild-type FAA in the nucleus. Our results suggest that FAA and FAC perform a concerted function in the cell nucleus, required for the maintenance of chromosomal stability. PMID:9742112
Variational tricomplex of a local gauge system, Lagrange structure and weak Poisson bracket
NASA Astrophysics Data System (ADS)
Sharapov, A. A.
2015-09-01
We introduce the concept of a variational tricomplex, which is applicable both to variational and nonvariational gauge systems. Assigning this tricomplex with an appropriate symplectic structure and a Cauchy foliation, we establish a general correspondence between the Lagrangian and Hamiltonian pictures of one and the same (not necessarily variational) dynamics. In practical terms, this correspondence allows one to construct the generating functional of a weak Poisson structure starting from that of a Lagrange structure. As a byproduct, a covariant procedure is proposed for deriving the classical BRST charge of the BFV formalism by a given BV master action. The general approach is illustrated by the examples of Maxwell’s electrodynamics and chiral bosons in two dimensions.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
2013-01-01
Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
Genetic population structure in the yellow mongoose, Cynictis penicillata.
Van Vuuren, B J; Robinson, T J
1997-12-01
Phylogeographic structure was determined for the yellow mongoose, Cynictis penicillata, using mtDNA RFLPs and control region sequences. The RFLP analysis revealed 13 haplotypes which showed weak geographical patterning consistent with a recent range expansion from a refugial population(s). An analysis of molecular variance (AMOVA) revealed no correspondence between mtDNA phylogeography and subspecies delimitation, nor between matrilines and areas characterized by a high incidence of the viverrid-type rabies, of which the yellow mongoose is the principal vector. The lack of structure was also shown by control region sequences although four of the maternal lineages shared a near-perfect 81 bp repeat. We speculate that regional hot spots of the viverrid rabies biotype reflect population density differences in the yellow mongoose that are not underscored by genetic partitioning, at least at the level of resolution provided by our analyses.
Evolutionary conservation of sequence and secondary structures inCRISPR repeats
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip
Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less
MAISTAS: a tool for automatic structural evaluation of alternative splicing products.
Floris, Matteo; Raimondo, Domenico; Leoni, Guido; Orsini, Massimiliano; Marcatili, Paolo; Tramontano, Anna
2011-06-15
Analysis of the human genome revealed that the amount of transcribed sequence is an order of magnitude greater than the number of predicted and well-characterized genes. A sizeable fraction of these transcripts is related to alternatively spliced forms of known protein coding genes. Inspection of the alternatively spliced transcripts identified in the pilot phase of the ENCODE project has clearly shown that often their structure might substantially differ from that of other isoforms of the same gene, and therefore that they might perform unrelated functions, or that they might even not correspond to a functional protein. Identifying these cases is obviously relevant for the functional assignment of gene products and for the interpretation of the effect of variations in the corresponding proteins. Here we describe a publicly available tool that, given a gene or a protein, retrieves and analyses all its annotated isoforms, provides users with three-dimensional models of the isoform(s) of his/her interest whenever possible and automatically assesses whether homology derived structural models correspond to plausible structures. This information is clearly relevant. When the homology model of some isoforms of a gene does not seem structurally plausible, the implications are that either they assume a structure unrelated to that of the other isoforms of the same gene with presumably significant functional differences, or do not correspond to functional products. We provide indications that the second hypothesis is likely to be true for a substantial fraction of the cases. http://maistas.bioinformatica.crs4.it/.
Slatyer, Rachel A; Nash, Michael A; Miller, Adam D; Endo, Yoshinori; Umbers, Kate D L; Hoffmann, Ary A
2014-10-02
Mountain landscapes are topographically complex, creating discontinuous 'islands' of alpine and sub-alpine habitat with a dynamic history. Changing climatic conditions drive their expansion and contraction, leaving signatures on the genetic structure of their flora and fauna. Australia's high country covers a small, highly fragmented area. Although the area is thought to have experienced periods of relative continuity during Pleistocene glacial periods, small-scale studies suggest deep lineage divergence across low-elevation gaps. Using both DNA sequence data and microsatellite markers, we tested the hypothesis that genetic partitioning reflects observable geographic structuring across Australia's mainland high country, in the widespread alpine grasshopper Kosciuscola tristis (Sjösted). We found broadly congruent patterns of regional structure between the DNA sequence and microsatellite datasets, corresponding to strong divergence among isolated mountain regions. Small and isolated mountains in the south of the range were particularly distinct, with well-supported divergence corresponding to climate cycles during the late Pliocene and Pleistocene. We found mixed support, however, for divergence among other mountain regions. Interestingly, within areas of largely contiguous alpine and sub-alpine habitat around Mt Kosciuszko, microsatellite data suggested significant population structure, accompanied by a strong signature of isolation-by-distance. Consistent patterns of strong lineage divergence among different molecular datasets indicate genetic breaks between populations inhabiting geographically distinct mountain regions. Three primary phylogeographic groups were evident in the highly fragmented Victorian high country, while within-region structure detected with microsatellites may reflect more recent population isolation. Despite the small area of Australia's alpine and sub-alpine habitats, their low topographic relief and lack of extensive glaciation, divergence among populations was on the same scale as that detected in much more extensive Northern hemisphere mountain systems. The processes driving divergence in the Australian mountains might therefore differ from their Northern hemisphere counterparts.
Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude
2008-11-15
Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.
Porosity and grain size controls on compaction band formation in Jurassic Navajo Sandstone
Schultz, Richard A.; Okubo, Chris H.; Fossen, Haakon
2010-01-01
Determining the rock properties that permit or impede the growth of compaction bands in sedimentary sequences is a critical problem of importance to studies of strain localization and characterization of subsurface geologic reservoirs. We determine the porosity and average grain size of a sequence of stratigraphic layers of Navajo Sandstone that are then used in a critical state model to infer plastic yield envelopes for the layers. Pure compaction bands are formed in layers having the largest average grain sizes (0.42–0.45 mm) and porosities (28%), and correspondingly the smallest values of critical pressure (-22 MPa) in the sequence. The results suggest that compaction bands formed in these layers after burial to -1.5 km depth in association with thrust faulting beneath the nearby East Kaibab monocline, and that hardening of the yield caps accompanied compactional deformation of the layers.
Local orientational mobility in regular hyperbranched polymers.
Dolgushev, Maxim; Markelov, Denis A; Fürstenberg, Florian; Guérin, Thomas
2016-07-01
We study the dynamics of local bond orientation in regular hyperbranched polymers modeled by Vicsek fractals. The local dynamics is investigated through the temporal autocorrelation functions of single bonds and the corresponding relaxation forms of the complex dielectric susceptibility. We show that the dynamic behavior of single segments depends on their remoteness from the periphery rather than on the size of the whole macromolecule. Remarkably, the dynamics of the core segments (which are most remote from the periphery) shows a scaling behavior that differs from the dynamics obtained after structural average. We analyze the most relevant processes of single segment motion and provide an analytic approximation for the corresponding relaxation times. Furthermore, we describe an iterative method to calculate the orientational dynamics in the case of very large macromolecular sizes.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
Dover, James H.; Tailleur, Irvin L.; Dumoulin, Julie A.
2004-01-01
The map depicts the field distribution and contact relations between stratigraphic units, the tectonic relations between major stratigraphic sequences, and the detailed internal structure of these sequences. The stratigraphic sequences formed in a variety of continental margin depositional environments, and subsequently underwent a complexde formational history of imbricate thrust faulting and folding. A compilation of micro and macro fossil identifications is included in this data set.
Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima
Yin, Yimeng; Das, Pratyush K; Jolma, Arttu; Zhu, Fangjie; Popov, Alexander; Xu, You; Nilsson, Lennart
2018-01-01
Most transcription factors (TFs) can bind to a population of sequences closely related to a single optimal site. However, some TFs can bind to two distinct sequences that represent two local optima in the Gibbs free energy of binding (ΔG). To determine the molecular mechanism behind this effect, we solved the structures of human HOXB13 and CDX2 bound to their two optimal DNA sequences, CAATAAA and TCGTAAA. Thermodynamic analyses by isothermal titration calorimetry revealed that both sites were bound with similar ΔG. However, the interaction with the CAA sequence was driven by change in enthalpy (ΔH), whereas the TCG site was bound with similar affinity due to smaller loss of entropy (ΔS). This thermodynamic mechanism that leads to at least two local optima likely affects many macromolecular interactions, as ΔG depends on two partially independent variables ΔH and ΔS according to the central equation of thermodynamics, ΔG = ΔH - TΔS. PMID:29638214
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition.
Chen, Wei; Feng, Peng-Mian; Lin, Hao; Chou, Kuo-Chen
2014-01-01
In eukaryotic genes, exons are generally interrupted by introns. Accurately removing introns and joining exons together are essential processes in eukaryotic gene expression. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapid and effective detection of splice sites that play important roles in gene structure annotation and even in RNA splicing. Although a series of computational methods were proposed for splice site identification, most of them neglected the intrinsic local structural properties. In the present study, a predictor called "iSS-PseDNC" was developed for identifying splice sites. In the new predictor, the sequences were formulated by a novel feature-vector called "pseudo dinucleotide composition" (PseDNC) into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on two benchmark datasets that the overall success rates achieved by iSS-PseDNC in identifying splice donor site and splice acceptor site were 85.45% and 87.73%, respectively. It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.
NASA Astrophysics Data System (ADS)
Improta, L.; Bagh, S.; De Gori, P.; Pastori, M.; Piccinini, D.; Valoroso, L.; Anselmi, M.; Buttinelli, M.; Chiarabba, C.
2015-12-01
The Val d'Agri (VA) Quaternary basin in the southern Apennines extensional belt hosts the largest oilfield in onshore Europe and normal-fault systems with high (up to M7) seismogenic potential. Frequent small-magnitude swarms related to both active crustal extension and anthropogenic activity have occurred in the region. Causal factors for induced seismicity are a water impoundment with severe seasonal oscillations and a high-rate wastewater injection well. We analyzed around 1200 earthquakes (ML<3.3) occurred in the VA and surrounding regions between 2001-2014. We integrated waveforms recorded at 46 seismic stations belonging to 3 different networks: a dense temporary network installed by INGV in 2005-2006, the permanent national network of INGV, and the trigger-mode monitoring network managed by the local operator ENI petroleum company. We used local earthquake tomography to investigate static and transient features of the crustal velocity structure and to accurately locate earthquakes. Vp and Vp/Vs models are parameterized by a 3x3x2 km spacing and well resolved down to about 12 km depth. The complex Vp model illuminates broad antiformal structures corresponding to wide ramp-anticlines involving Mesozoic carbonates of the Apulia hydrocarbon reservoir, and NW-SE trending low Vp regions related to thrust-sheet-top clastic basins. The VA basin corresponds to shallow low-Vp region. Focal mechanisms show normal faulting kinematics with minor strike slip solutions in agreement with the local extensional regime. Earthquake locations and focal solutions depict shallow (< 5 km depth) E-dipping extensional structures beneath the artificial lake located in the southern sector of the basin, and along the western margin of the VA. A few swarms define relatively deep transfer structures accommodating the differential extension between main normal faults. The spatio-temporal distribution of around 220 events correlates with wastewater disposal activity, illuminating a NE-dipping fault between 2-5 km depth in the carbonate reservoir. The fault measures 5 km along dip and corresponds to a pre-existing thrust fault favorably oriented with respect to the local extensional field.
Yamashita, Yuichi; Tani, Jun
2008-01-01
It is generally thought that skilled behavior in human beings results from a functional hierarchy of the motor control system, within which reusable motor primitives are flexibly integrated into various sensori-motor sequence patterns. The underlying neural mechanisms governing the way in which continuous sensori-motor flows are segmented into primitives and the way in which series of primitives are integrated into various behavior sequences have, however, not yet been clarified. In earlier studies, this functional hierarchy has been realized through the use of explicit hierarchical structure, with local modules representing motor primitives in the lower level and a higher module representing sequences of primitives switched via additional mechanisms such as gate-selecting. When sequences contain similarities and overlap, however, a conflict arises in such earlier models between generalization and segmentation, induced by this separated modular structure. To address this issue, we propose a different type of neural network model. The current model neither makes use of separate local modules to represent primitives nor introduces explicit hierarchical structure. Rather than forcing architectural hierarchy onto the system, functional hierarchy emerges through a form of self-organization that is based on two distinct types of neurons, each with different time properties (“multiple timescales”). Through the introduction of multiple timescales, continuous sequences of behavior are segmented into reusable primitives, and the primitives, in turn, are flexibly integrated into novel sequences. In experiments, the proposed network model, coordinating the physical body of a humanoid robot through high-dimensional sensori-motor control, also successfully situated itself within a physical environment. Our results suggest that it is not only the spatial connections between neurons but also the timescales of neural activity that act as important mechanisms leading to functional hierarchy in neural systems. PMID:18989398
Mizianty, Marcin J; Kurgan, Lukasz
2009-12-13
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
2009-01-01
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388
DOE Office of Scientific and Technical Information (OSTI.GOV)
Domingo Meza-Aguilar, J.; Laboratorio de Patogenicidad Bacteriana, Unidad de Hemato Oncología e Investigación, Hospital Infantil de México Federico Gómez 06720, D.F.; Fromme, Petra
Highlights: • X-ray crystal structure of the passenger domain of Plasmid encoded toxin at 2.3 Å. • Structural differences between Pet passenger domain and EspP protein are described. • High flexibility of the C-terminal beta helix is structurally assigned. - Abstract: Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause ofmore » acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50% compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181–190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135 and 143 compared to the structure of EspP.« less
Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A
2015-01-01
A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sutton, Ann; Trudeau, Natacha; Morford, Jill; Rios, Monica; Poirier, Marie-Andrée
2010-01-01
Children who require augmentative and alternative communication (AAC) systems while they are in the process of acquiring language face unique challenges because they use graphic symbols for communication. In contrast to the situation of typically developing children, they use different modalities for comprehension (auditory) and expression (visual). This study explored the ability of three- and four-year-old children without disabilities to perform tasks involving sequences of graphic symbols. Thirty participants were asked to transpose spoken simple sentences into graphic symbols by selecting individual symbols corresponding to the spoken words, and to interpret graphic symbol utterances by selecting one of four photographs corresponding to a sequence of three graphic symbols. The results showed that these were not simple tasks for the participants, and few of them performed in the expected manner - only one in transposition, and only one-third of participants in interpretation. Individual response strategies in some cases lead to contrasting response patterns. Children at this age level have not yet developed the skills required to deal with graphic symbols even though they have mastered the corresponding spoken language structures.
Luckow, H.G.; Pavlis, T.L.; Serpa, L.F.; Guest, B.; Wagner, D.L.; Snee, L.; Hensley, T.M.; Korjenkov, A.
2005-01-01
New 1:24,000 scale mapping, geochemical analyses of volcanic rocks, and Ar/Ar and tephrochronology analyses of the Wingate Wash, northern Owlshead Mountain and Southern Panamint Mountain region document a complex structural history constrained by syntectonic volcanism and sedimentation. In this study, the region is divided into five structural domains with distinct, but related, histories: (1) The southern Panamint domain is a structurally intact, gently south-tilted block dominated by a middle Miocene volcanic center recognized as localized hypabyssal intrusives surrounded by proximal facies pyroclastic rocks. This Miocene volcanic sequence is an unusual alkaline volcanic assemblage ranging from trachybasalt to rhyolite, but dominated by trachyandesite. The volcanic rocks are overlain in the southwestern Panamint Mountains by a younger (Late Miocene?) fanglomerate sequence. (2) An upper Wingate Wash domain is characterized by large areas of Quaternary cover and complex overprinting of older structure by Quaternary deformation. Quaternary structures record ???N-S shortening concurrent with ???E-W extension accommodated by systems of strike-slip and thrust faults. (3) A central Wingate Wash domain contains a complex structural history that is closely tied to the stratigraphic evolution. In this domain, a middle Miocene volcanic package contains two distinct assemblages; a lower sequence dominated by alkaline pyroclastic rocks similar to the southern Panamint sequence and an upper basaltic sequence of alkaline basalt and basanites. This volcanic sequence is in turn overlain by a coarse clastic sedimentary sequence that records the unroofing of adjacent ranges and development of ???N-S trending, west-tilted fault blocks. We refer to this sedimentary sequence as the Lost Lake assemblage. (4) The lower Wingate Wash/northern Owlshead domain is characterized by a gently north-dipping stratigraphic sequence with an irregular unconformity at the base developed on granitic basement. The unconformity is locally overlain by channelized deposits of older Tertiary(?) red conglomerate, some of which predate the onset of extensive volcanism, but in most of the area is overlain by a moderately thick package of Middle Miocene trachybasalt, trachyandesitic, ash flows, lithic tuff, basaltic cinder, basanites, and dacitic pyroclastic, debris, and lahar flows with localized exposures of sedimentary rocks. The upper part of the Miocene stratigraphic sequence in this domain is comprised of coarse grained-clastic sediments that are apparently middle Miocene based on Ar/Ar dating of interbedded volcanic rocks. This sedimentary sequence, however, is lithologically indistinguishable from the structurally adjacent Late Miocene Lost Lake assemblage and a stratigraphically overlying Plio-Pleistocene alluvial fan; a relationship that handicaps tracing structures through this domain. This domain is also structurally complex and deformed by a series of northwest-southeast-striking, east-dipping, high-angle oblique, sinistral, normal faults that are cut by left-lateral strike-slip faults. The contact between the southern Panamint domain and the adjacent domains is a complex fault system that we interpret as a zone of Late Miocene distributed sinistral slip that is variably overprinted in different portions of the mapped area. The net sinistral slip across the Wingate Wash fault system is estimated at 7-9 km, based on offset of Proterozoic Crystal Springs Formation beneath the middle Miocene unconformity to as much as 15 km based on offset volcanic facies in Middle Miocene rocks. To the south of Wingate Wash, the northern Owlshead Mountains are also cut by a sinistral, northwest-dipping, oblique normal fault, (referred to as the Filtonny Fault) with significant slip that separates the Lower Wingate Wash and central Owlshead domains. The Filtonny Fault may represent a young conjugate fault to the dextral Southern Death Valley fault system and may be the northwest
Panfil, M.S.; Gardner, T.W.; Hirth, K.G.
1999-01-01
Late Holocene (240 km2 on the east side of the volcano with >25 cm of tephra. Lavas from eruptive sequence I dammed drainage in the lowland area near the town of San Nicolas and caused local upstream deposition of as much as 30 m of lacustrine silts, clays, and sands. These lacustrine deposits record an eruptive hiatus for the Tetimpa area of about 750 14C yr: between ca. 2100 and ca. 1350 yr B.P., no major tephras were deposited in the Tetimpa area. In upland areas, this time period is represented by an unconformity and by Entisols formed in the top of pumice deposits and lavas from eruptive sequence I. Artifacts, agricultural furrows, and dwellings record human reoccupation of this surface. At the end of this hiatus, several lahars were deposited above the lacustrine sequence and locally above the Entisol in upland positions adjacent to streams. Between ca. 1350 and ca. 1200 yr B.P., tephras from eruptive sequence II buried these paleosols, occupation sites, lacustrine sediments, and lahars. Andesitic (~62% SiO2) pumice lapilli deposits in the Tetimpa area record three pumice-fall eruptions directed northeast and east of the crater. The first and smallest of these (maximum Tetimpa area thickness = 12 cm; >52 km2 covered by >25 cm) took place at ca. 1350 yr B.P. and was accompanied by pyroclastic surge events preserved in the Tetimpa area by charcoal, sand waves, and cross-stratified sand-sized tephra. At ca. 1200 yr B.P., the products of two Plinian-style events and additional pyroclastic surges reached the Tetimpa area. The largest of these tephra-fall events covered the Tetimpa area with 0.5-1 m of tephra and blanketed an area of >230 km2 with a thickness of >25 cm. The Tetimpa record confirms two of the four periods of explosive volcanism recognized by studies conducted around Popocatepetl in the past 30 yr. Eruptive sequence I corresponds to the explosive period between 2100 and 2500 yr B.P., and eruptive sequence II corresponds to the period between 900 and 1400 yr B.P. The archaeology and lacustrine stratigraphy of the Tetimpa area help constrain the timing of the Plinian phase of eruptive sequence I to ca. 2100 yr B.P. and suggest that the pumice-fall eruptions of eruptive sequence II took place in at least two intervals between ca. 1350 and ca. 1200 yr B.P.
Mochizuki, Ryota; Tsugama, Daisuke; Yamazaki, Michihiro; Fujino, Kaien; Masuda, Kiyoshi
2017-05-04
NMCP/CRWN (NUCLEAR MATRIX CONSTITUENT PROTEIN/CROWDED NUCLEI) is a major component of a protein fibrous meshwork (lamina-like structure) on the plant inner nuclear membrane. NMCP/CRWN contributes to regulating nuclear shape and nuclear functions. An NMCP/CRWN protein in Daucus carota (DcNMCP1) is localized to the nuclear periphery in interphase cells, and surrounds chromosomes in cells in metaphase and anaphase. The N-terminal region and the C-terminal region of DcNMCP1 are both necessary for localizing DcNMCP1 to the nuclear periphery. Here candidate interacting partners of the amino acid position 975-1053 of DcNMCP1 (T975-1053), which is present in the C-terminal region and contains a conserved sequence that plays a role in localizing DcNMCP1 to the nuclear periphery, are screened for. Arabidopsis thaliana nuclear proteins were subjected to far-Western blotting with GST-fused T975-1053 as a probe, and signals were detected at the positions corresponding to ∼70, ∼40, and ∼18 kDa. These ∼70, ∼40, and ∼18 kDa nuclear proteins were identified by mass spectrometry, and subjected to a yeast 2-hybrid (Y2H) analysis with T975-1053 as bait. In this analysis, the ∼40 kDa protein ARP7, which is a nuclear actin-related protein possibly involved in regulating chromatin structures, was confirmed to interact with T975-1053. Independently of the far-Western blotting, a Y2H screen was performed using T975-1053 as bait. Targeted Y2H assays confirmed that 3 proteins identified in the screen, MYB3, SINAT1, and BIM1, interact with T975-1053. These proteins might have roles in NMCP/CRWN protein-mediated biologic processes.
Yu, Ning; Wei, Yu-Long; Zhang, Xin; Zhu, Ning; Wang, Yan-Li; Zhu, Yue; Zhang, Hai-Ping; Li, Fen-Mei; Yang, Lan; Sun, Jia-Qi; Sun, Ai-Dong
2017-07-11
Trachelospermum jasminoides is commonly used in traditional Chinese medicine. However, the use of the plant's local alternatives is frequent, causing potential clinical problems. The T. jasminoides sold in the medicine market is commonly dried and sliced, making traditional identification methods difficult. In this study, the ITS2 region was evaluated on 127 sequences representing T. jasminoides and its local alternatives according to PCR and sequencing rates, intra- and inter-specific divergences, secondary structure, and discrimination capacity. Results indicated the 100% success rates of PCR and sequencing and the obvious presence of a barcoding gap. Results of BLAST 1, nearest distance and neighbor-joining tree methods showed that barcode ITS2 could successfully identify all the texted samples. The secondary structures of the ITS2 region provided another dimensionality for species identification. Two-dimensional images were obtained for better and easier identification. Previous studies on DNA barcoding concentrated more on the same family, genus, or species. However, an ideal barcode should be variable enough to identify closely related species. Meanwhile, the barcodes should also be conservative in identifying distantly related species. This study highlights the application of barcode ITS2 in solving practical problems in the distantly related local alternatives of medical plants.
Menzies, Georgina E.; Reed, Simon H.; Brancale, Andrea; Lewis, Paul D.
2015-01-01
The mutational pattern for the TP53 tumour suppressor gene in lung tumours differs to other cancer types by having a higher frequency of G:C>T:A transversions. The aetiology of this differing mutation pattern is still unknown. Benzo[a]pyrene,diol epoxide (BPDE) is a potent cigarette smoke carcinogen that forms guanine adducts at TP53 CpG mutation hotspot sites including codons 157, 158, 245, 248 and 273. We performed molecular modelling of BPDE-adducted TP53 duplex sequences to determine the degree of local distortion caused by adducts which could influence the ability of nucleotide excision repair. We show that BPDE adducted codon 157 has greater structural distortion than other TP53 G:C>T:A hotspot sites and that sequence context more distal to adjacent bases must influence local distortion. Using TP53 trinucleotide mutation signatures for lung cancer in smokers and non-smokers we further show that codons 157 and 273 have the highest mutation probability in smokers. Combining this information with adduct structural data we predict that G:C>T:A mutations at codon 157 in lung tumours of smokers are predominantly caused by BPDE. Our results provide insight into how different DNA sequence contexts show variability in DNA distortion at mutagen adduct sites that could compromise DNA repair at well characterized cancer related mutation hotspots. PMID:26400171
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
Automated side-chain model building and sequence assignment by template matching.
Terwilliger, Thomas C
2003-01-01
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
ɛ-connectedness, finite approximations, shape theory and coarse graining in hyperspaces
NASA Astrophysics Data System (ADS)
Alonso-Morón, Manuel; Cuchillo-Ibanez, Eduardo; Luzón, Ana
2008-12-01
We use upper semifinite hyperspaces of compacta to describe ε-connectedness and to compute homology from finite approximations. We find a new connection between ε-connectedness and the so-called Shape Theory. We construct a geodesically complete R-tree, by means of ε-components at different resolutions, whose behavior at infinite captures the topological structure of the space of components of a given compact metric space. We also construct inverse sequences of finite spaces using internal finite approximations of compact metric spaces. These sequences can be converted into inverse sequences of polyhedra and simplicial maps by means of what we call the Alexandroff-McCord correspondence. This correspondence allows us to relate upper semifinite hyperspaces of finite approximation with the Vietoris-Rips complexes of such approximations at different resolutions. Two motivating examples are included in the introduction. We propose this procedure as a different mathematical foundation for problems on data analysis. This process is intrinsically related to the methodology of shape theory. This paper reinforces Robins’s idea of using methods from shape theory to compute homology from finite approximations.
Prediction of protein secondary structure content for the twilight zone sequences.
Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke
2007-11-15
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.
Array of nucleic acid probes on biological chips for diagnosis of HIV and methods of using the same
Chee, Mark; Gingeras, Thomas R.; Fodor, Stephen P. A.; Hubble, Earl A.; Morris, MacDonald S.
1999-01-19
The invention provides an array of oligonucleotide probes immobilized on a solid support for analysis of a target sequence from a human immunodeficiency virus. The array comprises at least four sets of oligonucleotide probes 9 to 21 nucleotides in length. A first probe set has a probe corresponding to each nucleotide in a reference sequence from a human immunodeficiency virus. A probe is related to its corresponding nucleotide by being exactly complementary to a subsequence of the reference sequence that includes the corresponding nucleotide. Thus, each probe has a position, designated an interrogation position, that is occupied by a complementary nucleotide to the corresponding nucleotide. The three additional probe sets each have a corresponding probe for each probe in the first probe set. Thus, for each nucleotide in the reference sequence, there are four corresponding probes, one from each of the probe sets. The three corresponding probes in the three additional probe sets are identical to the corresponding probe from the first probe or a subsequence thereof that includes the interrogation position, except that the interrogation position is occupied by a different nucleotide in each of the four corresponding probes.
Diversity of Secondary Structure in Catalytic Peptides with β-Turn-Biased Sequences
2016-01-01
X-ray crystallography has been applied to the structural analysis of a series of tetrapeptides that were previously assessed for catalytic activity in an atroposelective bromination reaction. Common to the series is a central Pro-Xaa sequence, where Pro is either l- or d-proline, which was chosen to favor nucleation of canonical β-turn secondary structures. Crystallographic analysis of 35 different peptide sequences revealed a range of conformational states. The observed differences appear not only in cases where the Pro-Xaa loop-region is altered, but also when seemingly subtle alterations to the flanking residues are introduced. In many instances, distinct conformers of the same sequence were observed, either as symmetry-independent molecules within the same unit cell or as polymorphs. Computational studies using DFT provided additional insight into the analysis of solid-state structural features. Select X-ray crystal structures were compared to the corresponding solution structures derived from measured proton chemical shifts, 3J-values, and 1H–1H-NOESY contacts. These findings imply that the conformational space available to simple peptide-based catalysts is more diverse than precedent might suggest. The direct observation of multiple ground state conformations for peptides of this family, as well as the dynamic processes associated with conformational equilibria, underscore not only the challenge of designing peptide-based catalysts, but also the difficulty in predicting their accessible transition states. These findings implicate the advantages of low-barrier interconversions between conformations of peptide-based catalysts for multistep, enantioselective reactions. PMID:28029251
How is quantum information localized in gravity?
NASA Astrophysics Data System (ADS)
Donnelly, William; Giddings, Steven B.
2017-10-01
A notion of localization of information within quantum subsystems plays a key role in describing the physics of quantum systems, and in particular is a prerequisite for discussing important concepts such as entanglement and information transfer. While subsystems can be readily defined for finite quantum systems and in local quantum field theory, a corresponding definition for gravitational systems is significantly complicated by the apparent nonlocality arising due to gauge invariance, enforced by the constraints. A related question is whether "soft hair" encodes otherwise localized information, and the question of such localization also remains an important puzzle for proposals that gravity emerges from another structure such as a boundary field theory as in AdS/CFT. This paper describes different approaches to defining local subsystem structure, and shows that at least classically, perturbative gravity has localized subsystems based on a split structure, generalizing the split property of quantum field theory. This, and related arguments for QED, give simple explanations that in these theories there is localized information that is independent of fields outside a region, in particular so that there is no role for "soft hair" in encoding such information. Additional subtleties appear in quantum gravity. We argue that localized information exists in perturbative quantum gravity in the presence of global symmetries, but that nonperturbative dynamics is likely tied to a modification of such structure.
Beltukov, Y M; Fusco, C; Parshin, D A; Tanguy, A
2016-02-01
The vibrational properties of model amorphous materials are studied by combining complete analysis of the vibration modes, dynamical structure factor, and energy diffusivity with exact diagonalization of the dynamical matrix and the kernel polynomial method, which allows a study of very large system sizes. Different materials are studied that differ only by the bending rigidity of the interactions in a Stillinger-Weber modelization used to describe amorphous silicon. The local bending rigidity can thus be used as a control parameter, to tune the sound velocity together with local bonds directionality. It is shown that for all the systems studied, the upper limit of the Boson peak corresponds to the Ioffe-Regel criterion for transverse waves, as well as to a minimum of the diffusivity. The Boson peak is followed by a diffusivity's increase supported by longitudinal phonons. The Ioffe-Regel criterion for transverse waves corresponds to a common characteristic mean-free path of 5-7 Å (which is slightly bigger for longitudinal phonons), while the fine structure of the vibrational density of states is shown to be sensitive to the local bending rigidity.
Schulte, W; Töpfer, R; Stracke, R; Schell, J; Martini, N
1997-04-01
Three genes coding for different multifunctional acetyl-CoA carboxylase (ACCase; EC 6.4.1.2) isoenzymes from Brassica napus were isolated and divided into two major classes according to structural features in their 5' regions: class I comprises two genes with an additional coding exon of approximately 300 bp at the 5' end, and class II is represented by one gene carrying an intron of 586 bp in its 5' untranslated region. Fusion of the peptide sequence encoded by the additional first exon of a class I ACCase gene to the jellyfish Aequorea victoria green fluorescent protein (GFP) and transient expression in tobacco protoplasts targeted GFP to the chloroplasts. In contrast to the deduced primary structure of the biotin carboxylase domain encoded by the class I gene, the corresponding amino acid sequence of the class II ACCase shows higher identity with that of the Arabidopsis ACCase, both lacking a transit peptide. The Arabidopsis ACCase has been proposed to be a cytosolic isoenzyme. These observations indicate that the two classes of ACCase genes encode plastidic and cytosolic isoforms of multi-functional, eukaryotic type, respectively, and that B. napus contains at least one multi-functional ACCase besides the multi-subunit, prokaryotic type located in plastids. Southern blot analysis of genomic DNA from B. napus, Brassica rapa, and Brassica oleracea, the ancestors of amphidiploid rapeseed, using a fragment of a multi-functional ACCase gene as a probe revealed that ACCase is encoded by a multi-gene family of at least five members.
NASA Astrophysics Data System (ADS)
Castillo Vincentelli, Maria Gabriela; Favoreto, Julia; Roemers-Oliveira, Eduardo
2018-02-01
An integrated geophysical and geological analysis of a carbonate reservoir can offer an effective method to better understand the paleogeographical evolution and distribution of a geological reservoir and non-reservoir facies. Therefore, we propose a better method for obtaining geological facies from geophysical facies, helping to characterize the permo-porous system of this kind of play. The goal is to determine the main geological phases from a specific hydrocarbon producer (Albian Campos Basin, Brazil). The applied method includes the use of a petrographic and qualitative description from the integrated reservoir with seismic interpretation of an attribute map (energy, root mean square, mean amplitude, maximum negative amplitude, etc), all calculated at the Albian level for each of the five identified phases. The studied carbonate reservoir is approximately 6 km long with a main direction of NE-SW, and it was sub-divided as follows (from bottom to top): (1) the first depositional sequence of the bank was composed mainly of packstone, indicating that the local structure adjacent to the main bank is protected from environmental conditions; (2) characterized by the presence of grainstone developed at the higher structure; (3) the main sequence of the peloidal packstone with mudstones oncoids; (4) corresponds to the oil production of carbonate reservoirs formed by oolitic grainstone deposited at the top of the carbonate bank; at this phase, rising sea levels formed channels that connected the open sea shelf with the restricted circulation shelf; and (5) mudstone and wackestone represent the system’s flooding phase.
Comprehensive analysis of the dynamic structure of nuclear localization signals.
Yamagishi, Ryosuke; Okuyama, Takahide; Oba, Shuntaro; Shimada, Jiro; Chaen, Shigeru; Kaneko, Hiroki
2015-12-01
Most transcription and epigenetic factors in eukaryotic cells have nuclear localization signals (NLSs) and are transported to the nucleus by nuclear transport proteins. Understanding the features of NLSs and the mechanisms of nuclear transport might help understand gene expression regulation, somatic cell reprogramming, thus leading to the treatment of diseases associated with abnormal gene expression. Although many studies analyzed the amino acid sequence of NLSs, few studies investigated their three-dimensional structure. Therefore, we conducted a statistical investigation of the dynamic structure of NLSs by extracting the conformation of these sequences from proteins examined by X-ray crystallography and using a quantity defined as conformational determination rate (a ratio between the number of amino acids determining the conformation and the number of all amino acids included in a certain region). We found that determining the conformation of NLSs is more difficult than determining the conformation of other regions and that NLSs may tend to form more heteropolymers than monomers. Therefore, these findings strongly suggest that NLSs are intrinsically disordered regions.
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.
Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles
2015-07-01
The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Perceptions of randomness in binary sequences: Normative, heuristic, or both?
Reimers, Stian; Donkin, Chris; Le Pelley, Mike E
2018-03-01
When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)
Sato-Masumoto, Naoko; Ito, Michiho
2014-06-01
Geraniol and linalool are acyclic monoterpenes found in plant essential oils that have attracted much attention for their commercial use and in pharmaceutical studies. They are synthesized from geranyl diphosphate (GDP) by geraniol and linalool synthases, respectively. Both synthases are very similar at the amino acid level and share the same substrate; however, the position of the GDP to which they introduce hydroxyl groups is different. In this study, the mechanisms underlying the regiospecific hydroxylation of geraniol and linalool synthases were investigated using a domain swapping approach and site-directed mutagenesis in perilla. Sequences of the synthases were divided into ten domains (domains I to IV-4), and each corresponding domain was exchanged between both enzymes. It was shown that different regions were important for the formation of geraniol and linalool, namely, domains IV-1 and -4 for geraniol, and domains III-b, III-d, and IV-4 for linalool. These results suggested that the conformation of carbocation intermediates and their electron localization were seemingly to be different between geraniol and linalool synthases. Further, five amino acids in domain IV-4 were apparently indispensable for the formation of geraniol and linalool. According to three-dimensional structural models of the synthases, these five residues seemed to be responsible for the different spatial arrangement of the amino acid at H524 in the case of geraniol synthase, while N526 is the corresponding residue in linalool synthase. These results suggested that the side-chains of these five amino acids, in combination with several relevant domains, localized the positive charge in the carbocation intermediate to determine the position of the introduced hydroxyl group. Copyright © 2014 Elsevier Ltd. All rights reserved.
Binding Mechanisms of Intrinsically Disordered Proteins: Theory, Simulation, and Experiment
Mollica, Luca; Bessa, Luiza M.; Hanoulle, Xavier; Jensen, Malene Ringkjøbing; Blackledge, Martin; Schneider, Robert
2016-01-01
In recent years, protein science has been revolutionized by the discovery of intrinsically disordered proteins (IDPs). In contrast to the classical paradigm that a given protein sequence corresponds to a defined structure and an associated function, we now know that proteins can be functional in the absence of a stable three-dimensional structure. In many cases, disordered proteins or protein regions become structured, at least locally, upon interacting with their physiological partners. Many, sometimes conflicting, hypotheses have been put forward regarding the interaction mechanisms of IDPs and the potential advantages of disorder for protein-protein interactions. Whether disorder may increase, as proposed, e.g., in the “fly-casting” hypothesis, or decrease binding rates, increase or decrease binding specificity, or what role pre-formed structure might play in interactions involving IDPs (conformational selection vs. induced fit), are subjects of intense debate. Experimentally, these questions remain difficult to address. Here, we review experimental studies of binding mechanisms of IDPs using NMR spectroscopy and transient kinetic techniques, as well as the underlying theoretical concepts and numerical methods that can be applied to describe these interactions at the atomic level. The available literature suggests that the kinetic and thermodynamic parameters characterizing interactions involving IDPs can vary widely and that there may be no single common mechanism that can explain the different binding modes observed experimentally. Rather, disordered proteins appear to make combined use of features such as pre-formed structure and flexibility, depending on the individual system and the functional context. PMID:27668217
Protein subcellular localization assays using split fluorescent proteins
Waldo, Geoffrey S [Santa Fe, NM; Cabantous, Stephanie [Los Alamos, NM
2009-09-08
The invention provides protein subcellular localization assays using split fluorescent protein systems. The assays are conducted in living cells, do not require fixation and washing steps inherent in existing immunostaining and related techniques, and permit rapid, non-invasive, direct visualization of protein localization in living cells. The split fluorescent protein systems used in the practice of the invention generally comprise two or more self-complementing fragments of a fluorescent protein, such as GFP, wherein one or more of the fragments correspond to one or more beta-strand microdomains and are used to "tag" proteins of interest, and a complementary "assay" fragment of the fluorescent protein. Either or both of the fragments may be functionalized with a subcellular targeting sequence enabling it to be expressed in or directed to a particular subcellular compartment (i.e., the nucleus).
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.
Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju
2015-01-01
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
Propensities of peptides containing the Asn-Gly segment to form β-turn and β-hairpin structures.
Kang, Young Kee; Yoo, In Kee
2016-09-01
The propensities of peptides that contain the Asn-Gly segment to form β-turn and β-hairpin structures were explored using the density functional methods and the implicit solvation model in CH2 Cl2 and water. The populations of preferred β-turn structures varied depending on the sequence and solvent polarity. In solution, β-hairpin structures with βI' turn motifs were most preferred for the heptapeptides containing the Asn-Gly segment regardless of the sequence of the strands. These preferences in solution are consistent with the corresponding X-ray structures. The sequence, H-bond strengths, solvent polarity, and conformational flexibility appeared to interact to determine the preferred β-hairpin structure of each heptapeptide, although the β-turn segments played a role in promoting the formation of β-hairpin structures and the β-hairpin propensity varied. In the heptapeptides containing the Asn-Gly segment, the β-hairpin formation was enthalpically favored and entropically disfavored at 25°C in water. The calculated results for β-turns and β-hairpins containing the Asn-Gly segment imply that these structural preferences may be useful for the design of bioactive macrocyclic peptides containing β-hairpin mimics and the design of binding epitopes for protein-protein and protein-nucleic acid recognitions. © 2016 Wiley Periodicals, Inc. Biopolymers 105: 653-664, 2016. © 2016 Wiley Periodicals, Inc.
Topology-guided deformable registration with local importance preservation for biomedical images
NASA Astrophysics Data System (ADS)
Zheng, Chaojie; Wang, Xiuying; Zeng, Shan; Zhou, Jianlong; Yin, Yong; Feng, Dagan; Fulham, Michael
2018-01-01
The demons registration (DR) model is well recognized for its deformation capability. However, it might lead to misregistration due to erroneous diffusion direction when there are no overlaps between corresponding regions. We propose a novel registration energy function, introducing topology energy, and incorporating a local energy function into the DR in a progressive registration scheme, to address these shortcomings. The topology energy that is derived from the topological information of the images serves as a direction inference to guide diffusion transformation to retain the merits of DR. The local energy constrains the deformation disparity of neighbouring pixels to maintain important local texture and density features. The energy function is minimized in a progressive scheme steered by a topology tree graph and we refer to it as topology-guided deformable registration (TDR). We validated our TDR on 20 pairs of synthetic images with Gaussian noise, 20 phantom PET images with artificial deformations and 12 pairs of clinical PET-CT studies. We compared it to three methods: (1) free-form deformation registration method, (2) energy-based DR and (3) multi-resolution DR. The experimental results show that our TDR outperformed the other three methods in regard to structural correspondence and preservation of the local important information including texture and density, while retaining global correspondence.
The rRNA evolution and procaryotic phylogeny
NASA Technical Reports Server (NTRS)
Fox, G. E.
1986-01-01
Studies of ribosomal RNA primary structure allow reconstruction of phylogenetic trees for prokaryotic organisms. Such studies reveal major dichotomy among the bacteria that separates them into eubacteria and archaebacteria. Both groupings are further segmented into several major divisions. The results obtained from 5S rRNA sequences are essentially the same as those obtained with the 16S rRNA data. In the case of Gram negative bacteria the ribosomal RNA sequencing results can also be directly compared with hybridization studies and cytochrome c sequencing studies. There is again excellent agreement among the several methods. It seems likely then that the overall picture of microbial phylogeny that is emerging from the RNA sequence studies is a good approximation of the true history of these organisms. The RNA data allow examination of the evolutionary process in a semi-quantitative way. The secondary structures of these RNAs are largely established. As a result it is possible to recognize examples of local structural evolution. Evolutionary pathways accounting for these events can be proposed and their probability can be assessed.
Keith, William J.; Theodore, Ted G.
1979-01-01
The widespread distribution of Tertiary volcanic rocks in south-central Arizona is controlled in part by prevolcanic structures along which volcanic vents were localized. Volcanic rocks in the Mineral Mountain and Teapot Mountain quadrangles mark the site of a major northwest-trending structural hingeline. This hingeline divides an older Precambrian X terrane on the west from intensely deformed sequences of rock as young as Pennsylvanian on the east, suggesting increased westerly uplift. The volcanic rocks consist of a pile of complexly interlayered rhyolite, andesite, dacite, flows and intrusive rocks, water-laid tuffs, and very minor olivine basalt. Although the rocks erupted from several different vents, time relations, space relations, and chemistry each give strong evidence of a single source for all the rocks. Available data (by the K-Ar dating method) on hornblende and biotite separates from the volcanic rocks range from 14 to 19 m.y. and establish the pre-middle Miocene age of major dislocations along the structural hingeline. Most of the volcanic rocks contain glass, either at the base of the flows or as an envelope around the intrusive phases. One of the intrusive rhyolites, however, seems to represent one of the final eruptions. Intense vesiculation of the intrusive rhyolite suggests a large content of volatiles at the time of its eruption. Mineralization is associated with the more silicic of these middle Miocene volcanic rocks; specifically, extensive fissure quartz veins contain locally significant amounts of silver, lead, and zinc and minor amounts of gold. Many of the most productive deposits are hosted by the volcanic rocks, although others occur in the Precambrian rocks. Magnetic data correspond roughly to the geology in outlining the overall extent of the volcanic rocks as a magnetic low.
NASA Astrophysics Data System (ADS)
Toni, Mostafa; Barth, Andreas; Ali, Sherif M.; Wenzel, Friedemann
2016-09-01
On 22 January 2013 an earthquake with local magnitude ML 4.1 occurred in the central part of the Gulf of Suez. Six months later on 1 June 2013 another earthquake with local magnitude ML 5.1 took place at the same epicenter and different depths. These two perceptible events were recorded and localized by the Egyptian National Seismological Network (ENSN) and additional networks in the region. The purpose of this study is to determine focal mechanisms and source parameters of both earthquakes to analyze their tectonic relation. We determine the focal mechanisms by applying moment tensor inversion and first motion analysis of P- and S-waves. Both sources reveal oblique focal mechanisms with normal faulting and strike-slip components on differently oriented faults. The source mechanism of the larger event on 1 June in combination with the location of aftershock sequence indicates a left-lateral slip on N-S striking fault structure in 21 km depth that is in conformity with the NE-SW extensional Shmin (orientation of minimum horizontal compressional stress) and the local fault pattern. On the other hand, the smaller earthquake on 22 January with a shallower hypocenter in 16 km depth seems to have happened on a NE-SW striking fault plane sub-parallel to Shmin. Thus, here an energy release on a transfer fault connecting dominant rift-parallel structures might have resulted in a stress transfer, triggering the later ML 5.1 earthquake. Following Brune's model and using displacement spectra, we calculate the dynamic source parameters for the two events. The estimated source parameters for the 22 January 2013 and 1 June 2013 earthquakes are fault length (470 and 830 m), stress drop (1.40 and 2.13 MPa), and seismic moment (5.47E+21 and 6.30E+22 dyn cm) corresponding to moment magnitudes of MW 3.8 and 4.6, respectively.
Magnetic resonance imaging of the normal bovine digit.
Raji, A R; Sardari, K; Mirmahmoob, P
2009-08-01
The purpose of this study was defining the normal structures of the digits and hoof in Holstein dairy cattle using Magnetic Resonance Image (MRI). Transverse, Sagital and Dorsoplantar MRI images of three isolated cattle cadaver digits were obtained using Gyroscan T5-NT a magnet of 0.5 Tesla and T1 Weighted sequence. The MRI images were compared to corresponding frozen cross-sections and dissect specimens of the cadaver digits. Relevant anatomical structures were identified and labeled at each level. The MRI images provided anatomical detail of the digits and hoof in Holstein dairy cattle. Transversal images provided excellent depiction of anatomical structures when compared to corresponding frozen cross-sections. The information presented in this paper would serve as an initial reference to the evaluation of MRI images of the digits and hoof in Holstein dairy cattle, that can be used by radiologist, clinicians, surgeon or for research propose in bovine lameness.
Structural Analysis of Biodiversity
Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu
2010-01-01
Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371
Kawagoshi, Taiki; Nishida, Chizuko; Ota, Hidetoshi; Kumazawa, Yoshinori; Endo, Hideki; Matsuda, Yoichi
2008-01-01
Crocodilians have several unique karyotypic features, such as small diploid chromosome numbers (30-42) and the absence of dot-shaped microchromosomes. Of the extant crocodilian species, the Siamese crocodile (Crocodylus siamensis) has no more than 2n = 30, comprising mostly bi-armed chromosomes with large centromeric heterochromatin blocks. To investigate the molecular structures of C-heterochromatin and genomic compartmentalization in the karyotype, characterized by the disappearance of tiny microchromosomes and reduced chromosome number, we performed molecular cloning of centromeric repetitive sequences and chromosome mapping of the 18S-28S rDNA and telomeric (TTAGGG)( n ) sequences. The centromeric heterochromatin was composed mainly of two repetitive sequence families whose characteristics were quite different. Two types of GC-rich CSI-HindIII family sequences, the 305 bp CSI-HindIII-S (G+C content, 61.3%) and 424 bp CSI-HindIII-M (63.1%), were localized to the intensely PI-stained centric regions of all chromosomes, except for chromosome 2 with PI-negative heterochromatin. The 94 bp CSI-DraI (G+C content, 48.9%) was tandem-arrayed satellite DNA and localized to chromosome 2 and four pairs of small-sized chromosomes. The chromosomal size-dependent genomic compartmentalization that is supposedly unique to the Archosauromorpha was probably lost in the crocodilian lineage with the disappearance of microchromosomes followed by the homogenization of centromeric repetitive sequences between chromosomes, except for chromosome 2.
Dolati, Parviz; Gokoglu, Abdulkerim; Eichberg, Daniel; Zamani, Amir; Golby, Alexandra; Al-Mefty, Ossama
2015-01-01
Background: Skull base tumors frequently encase or invade adjacent normal neurovascular structures. For this reason, optimal tumor resection with incomplete knowledge of patient anatomy remains a challenge. Methods: To determine the accuracy and utility of image-based preoperative segmentation in skull base tumor resections, we performed a prospective study. Ten patients with skull base tumors underwent preoperative 3T magnetic resonance imaging, which included thin section three-dimensional (3D) space T2, 3D time of flight, and magnetization-prepared rapid acquisition gradient echo sequences. Imaging sequences were loaded in the neuronavigation system for segmentation and preoperative planning. Five different neurovascular landmarks were identified in each case and measured for accuracy using the neuronavigation system. Each segmented neurovascular element was validated by manual placement of the navigation probe, and errors of localization were measured. Results: Strong correspondence between image-based segmentation and microscopic view was found at the surface of the tumor and tumor-normal brain interfaces in all cases. The accuracy of the measurements was 0.45 ± 0.21 mm (mean ± standard deviation). This information reassured the surgeon and prevented vascular injury intraoperatively. Preoperative segmentation of the related cranial nerves was possible in 80% of cases and helped the surgeon localize involved cranial nerves in all cases. Conclusion: Image-based preoperative vascular and neural element segmentation with 3D reconstruction is highly informative preoperatively and could increase the vigilance of neurosurgeons for preventing neurovascular injury during skull base surgeries. Additionally, the accuracy found in this study is superior to previously reported measurements. This novel preliminary study is encouraging for future validation with larger numbers of patients. PMID:26674155
Dolati, Parviz; Gokoglu, Abdulkerim; Eichberg, Daniel; Zamani, Amir; Golby, Alexandra; Al-Mefty, Ossama
2015-01-01
Skull base tumors frequently encase or invade adjacent normal neurovascular structures. For this reason, optimal tumor resection with incomplete knowledge of patient anatomy remains a challenge. To determine the accuracy and utility of image-based preoperative segmentation in skull base tumor resections, we performed a prospective study. Ten patients with skull base tumors underwent preoperative 3T magnetic resonance imaging, which included thin section three-dimensional (3D) space T2, 3D time of flight, and magnetization-prepared rapid acquisition gradient echo sequences. Imaging sequences were loaded in the neuronavigation system for segmentation and preoperative planning. Five different neurovascular landmarks were identified in each case and measured for accuracy using the neuronavigation system. Each segmented neurovascular element was validated by manual placement of the navigation probe, and errors of localization were measured. Strong correspondence between image-based segmentation and microscopic view was found at the surface of the tumor and tumor-normal brain interfaces in all cases. The accuracy of the measurements was 0.45 ± 0.21 mm (mean ± standard deviation). This information reassured the surgeon and prevented vascular injury intraoperatively. Preoperative segmentation of the related cranial nerves was possible in 80% of cases and helped the surgeon localize involved cranial nerves in all cases. Image-based preoperative vascular and neural element segmentation with 3D reconstruction is highly informative preoperatively and could increase the vigilance of neurosurgeons for preventing neurovascular injury during skull base surgeries. Additionally, the accuracy found in this study is superior to previously reported measurements. This novel preliminary study is encouraging for future validation with larger numbers of patients.
Heyting, C; Menke, H H
1979-01-11
1. We have determined the physical location of mitochondrial genetic markers in the 21S region of yeast mtDNA by genetic analysis of petite mutants whose mtDNA has been physically mapped on the wild-type mtDNA. 2. The order of loci, determined in this study, is in agreement with the order deduced from recombination analysis and coretention analysis except for the position of omega+: we conclude that omega+ is located between C321 (RIB-1) and E514 (RIB-3). 3. The marker E514 (RIB-3) has been localized on a DNA segment of 3800 bp, and the markers E354, E553 and cs23 (RIB-2) on a DNA segment of 1100 base pairs; both these segments overlap the 21S rRNA cistron. The marker C321 (RIB-1) has been localized within a segment of 240 bp which also overlaps the 21S rRNA cistron, and we infer on the basis of indirect evidence that this marker lies within this cistron. 4. In all our rho+ as well as rho- strains there is a one-to-one correlation between the omega+ phenotype, the ability to transmit the omega+ allele and the presence of a mtDNA segment of about 1000 bp long, located between sequences specifying RIB-3 and sequences corresponding to the loci RIB-1 and RIB-2. This segment may be inserted at this same position into omega- mtDNA by recombination. 5. The role which the different allelic forms of omega may play in the polarity of recombination is discussed.
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
NASA Astrophysics Data System (ADS)
Ashworth, J. R.; Sheplev, V. S.
1997-09-01
Layered coronas between two reactant minerals can, in many cases, be attributed to diffusion-controlled growth with local equilibrium. This paper clarifies and unifies the previous approaches of various authors to the simplest form of modelling, which uses no assumed values for thermochemical quantities. A realistic overall reaction must be estimated from measured overall proportions of minerals and their major element compositions. Modelling is not restricted to a particular number of components S, relative to the number of phases Φ. IfΦ > S + 1, the overall reaction is a combination of simultaneous reactions. The stepwise method, solving for the local reaction at each boundary in turn, is extended to allow for recurrence of a mineral (its presence in two parts of the layer structure separated by a gap). The equations are also given in matrix form. A thermodynamic stability criterion is derived, determining which layer sequence is truly stable if several are computable from the same inputs. A layer structure satisfying the stability criterion has greater growth rate (and greater rate of entropy production) than the other computable layer sequences. This criterion of greatest entropy production is distinct from Prigogine's theorem of minimum entropy production, which distinguishes the stationary or quasi-stationary state from other states of the same layer sequence. The criterion leads to modification of previous results for coronas comprising hornblende, spinel, and orthopyroxene between olivine (Ol) and plagioclase (Pl). The outcome supports the previous inference that Si, and particularly Al, commonly behave as immobile relative to other cation-forming major elements. The affinity (-ΔG) of a corona-forming reaction is estimated, using previous estimates of diffusion coefficient and the duration t of reaction, together with a new model quantity (-ΔG) *. For an example of the Ol + Pl reaction, a rough calculation gives (-ΔG) > 1.7RT (per mole of P1 consumed, based on a 24-oxygen formula for Pl). At 600-700°C, this represents (-ΔG) > 10kJ mol -1 and departure from equilibrium temperature by at least ˜ 100°C. The lower end of this range is petrologically reasonable and, for t < 100Ma, corresponds to a Fick's-law diffusion coefficient for Al, DAl > 10 -25m 2s -1, larger than expected for lattice diffusion but consistent with fluid-absent grain-boundary diffusion and small concentration gradients.
Characteristic classes, singular embeddings, and intersection homology.
Cappell, S E; Shaneson, J L
1987-06-01
This note announces some results on the relationship between global invariants and local topological structure. The first section gives a local-global formula for Pontrjagin classes or L-classes. The second section describes a corresponding decomposition theorem on the level of complexes of sheaves. A final section mentions some related aspects of "singular knot theory" and the study of nonisolated singularities. Analogous equivariant analogues, with local-global formulas for Atiyah-Singer classes and their relations to G-signatures, will be presented in a future paper.
Bor, Daniel; Billington, Jac; Baron-Cohen, Simon
2007-10-01
SINGLE CASE: DT is a savant with exceptional abilities in numerical memory and mathematical calculations. DT also has an elaborate form of synaesthesia for visually presented digits. Further more, DT also has Asperger syndrome (AS). We carried out two preliminary investigations to establish whether these conditions may contribute to his savant abilities. In an fMRI digit span study, DT showed hyperactivity in lateral prefrontal cortex when encoding digits, compared with controls. In addition, while controls showed raised lateral prefrontal activation in response to structured (compared to unstructured) sequences of digits, DT's neural activity did not differ between these two conditions. In addition, controls showed a significant performance advantage for structured, compared with unstructured sequences whereas no such pattern was found for DT. We suggest that this performance pattern reflects that DT focuses less on external mathematical structure, since for him all digit sequences have internal structure linked to his synaesthesia. Finally, DT did not activate extra-striate regions normally associated with synaesthesia, suggesting that he has an unusual and more abstract and conceptual form of synaesthesia. This appears to generate structured, highly-chunked content that enhances encoding of digits and aids both recall and calculation. People with AS preferentially attend to local features of stimuli. To test this in DT, we administered the Navon task. Relative to controls, DT was faster at finding a target at the local level, and was less distracted by interference from the global level. The propensity to focus on local detail, in concert with a form of synaesthesia that provides structure to all digits, may account for DT's exceptional numerical memory and calculation ability. This neural and cognitive pattern needs to be tested in a series of similar cases, and with more constrained control groups, to confirm the significance of this association.
Photophysical Characterization of Enhanced 6-Methylisoxanthopterin Fluorescence in Duplex DNA.
Moreno, Andrew; Knee, J L; Mukerji, Ishita
2016-12-08
The structure and dynamic motions of bases in DNA duplexes and other constructs are important for understanding mechanisms of selectivity and recognition of DNA-binding proteins. The fluorescent guanine analogue, 6-methylisoxanthopterin 6-MI, is well suited to this purpose as it exhibits an unexpected 3- to 4-fold increase in relative quantum yield upon duplex formation when incorporated into the following sequences: ATFAA, AAFTA, or ATFTA (where F represents 6-MI). To better understand some of the factors leading to the 6-MI fluorescence increase upon duplex formation, we characterized the effect of local sequence and structural perturbations on 6-MI photophysics through temperature melts, quantum yield measurements, fluorescence quenching assays, and fluorescence lifetime measurements. By examining 21 sequences we have determined that the duplex-enhanced fluorescence (DEF) depends on the composition of bases adjacent to 6-MI and the presence of adenines at locations n ± 2 from the probe. Investigation of duplex stability and local solvent accessibility measurements support a model in which the DEF arises from a constrained geometry of 6-MI in the duplex, which remains H-bonded to cytosine, stacked with adjacent bases and inaccessible to quenchers. Perturbation of DNA structure through the introduction of an unpaired base 3' to 6-MI or a mismatched basepair increases 6-MI dynamic motion leading to fluorescence quenching and a reduction in quantum yield. Molecular dynamics simulations suggest the enhanced fluorescence results from a greater degree of twist at the X-F step relative to the quenched duplexes examined. These results point to a model where adenine residues located at n ± 2 from 6-MI induce a structural geometry with greater twist in the duplex that hinders local motion reducing dynamic quenching and producing an increase in 6-MI fluorescence.
NASA Astrophysics Data System (ADS)
Azaïez, Hajer; Bédir, Mourad; Tanfous, Dorra; Soussi, Mohamed
2007-05-01
In central Tunisia, Lower Cretaceous deposits represent carbonate and sandstone reservoir series that correspond to proven oil fields. The main problems for hydrocarbon exploration of these levels are their basin tectonic configuration and their sequence distribution in addition to the source rock availability. The Central Atlas of Tunisia is characterized by deep seated faults directed northeast-southwest, northwest-southeast and north-south. These faults limit inherited tectonic blocks and show intruded Triassic salt domes. Lower Cretaceous series outcropping in the region along the anticline flanks present platform deposits. The seismic interpretation has followed the Exxon methodologies in the 26th A.A.P.G. Memoir. The defined Lower Cretaceous seismic units were calibrated with petroleum well data and tied to stratigraphic sequences established by outcrop studies. This allows the subsurface identification of subsiding zones and thus sequence deposit distribution. Seismic mapping of these units boundary shows a structuring from a platform to basin blocks zones and helps to understand the hydrocarbon reservoir systems-tract and horizon distribution around these domains.
Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S
2015-09-01
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S
2015-01-01
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648
Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal
2014-01-01
With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
Antifreeze glycopeptide analogues: microwave-enhanced synthesis and functional studies.
Heggemann, Carolin; Budke, Carsten; Schomburg, Benjamin; Majer, Zsuzsa; Wissbrock, Marco; Koop, Thomas; Sewald, Norbert
2010-01-01
Antifreeze glycoproteins enable life at temperatures below the freezing point of physiological solutions. They usually consist of the repetitive tripeptide unit (-Ala-Ala-Thr-) with the disaccharide alpha-D-galactosyl-(1-3)-beta-N-acetyl-D-galactosamine attached to each hydroxyl group of threonine. Monoglycosylated analogues have been synthesized from the corresponding monoglycosylated threonine building block by microwave-assisted solid phase peptide synthesis. This method allows the preparation of analogues containing sequence variations which are not accessible by other synthetic methods. As antifreeze glycoproteins consist of numerous isoforms they are difficult to obtain in pure form from natural sources. The synthetic peptides have been structurally analyzed by CD and NMR spectroscopy in proton exchange experiments revealing a structure as flexible as reported for the native peptides. Microphysical recrystallization tests show an ice structuring influence and ice growth inhibition depending on the concentration, chain length and sequence of the peptides.
Solar Terminator Waves in the Ionosphere Measured by the Wallops Island, VA Dynasonde
NASA Astrophysics Data System (ADS)
Zabotin, N. A.; Song, H.; Bullett, T. W.
2017-12-01
Solar terminator represents a unique source of atmospheric waves possessing of near-ideal coherent properties: its geometry and magnitude of the impact changes very little from day to day. This feature has been used in [Forbes et al., GRL, 2008] to obtain "snapshots" of terminator waves in the neutral atmosphere at the altitude 400 km by averaging CHAMP accelerometer data over relatively long sequences of the satellite passes. The results were represented in the geographic latitude vs local time coordinates. We apply a similar approach averaging time series of Wallops Island, VA Dynasonde Doppler data to obtain "snapshots" of terminator waves in the ionosphere in the true altitude vs local "terminator time" coordinates. The averaging is performed independently for every month of the yearlong observation period from May 2013 to April 2014. The altitude range covered is 90 km to 400 km with 2 km resolution, representing the entire bottom-side ionosphere. Individual local time segments used for the averaging were 12 hours long and all centered at the times of the sunrise or sunset terminator passing at every specific altitude. This procedure effectively suppresses all kinds of incoherent wave activity and allows one to reveal the perturbation phenomenon mainly caused by the solar terminator. This is an important advantage of this technique compared to multiple "terminator wave" studies based on simple time coincidence. Both sunrise and sunset terminator waves are easily visualized in all of the monthly images. Our results confirm observations of [Forbes et al., GRL, 2008] of the wave structures existing on both sides of the terminator. The phase fronts of the sunset terminator wave are propagating downward indicating upward movement of the terminator-related disturbance and of the wave energy generated by it. The phase fronts of the sunrise terminator waves are propagating upward indicating downward movement of the terminator-related disturbance and of the wave energy generated by it. Spectral analysis of the local time sequences reveals characteristic peaks in the terminator-related wave activity corresponding to the periods 40-60 min and 2 hours. We also analyze statistics of their horizontal wavelengths.
Localized structural frustration for evaluating the impact of sequence variants.
Kumar, Sushant; Clarke, Declan; Gerstein, Mark
2016-12-01
Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Spielmann, A; Stutz, E
1983-01-01
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2. PMID:6314279
De novo identification of highly diverged protein repeats by probabilistic consistency.
Biegert, A; Söding, J
2008-03-15
An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID
Alternative DNA structure formation in the mutagenic human c-MYC promoter
del Mundo, Imee Marie A.; Zewail-Foote, Maha; Kerwin, Sean M.
2017-01-01
Abstract Mutation ‘hotspot’ regions in the genome are susceptible to genetic instability, implicating them in diseases. These hotspots are not random and often co-localize with DNA sequences potentially capable of adopting alternative DNA structures (non-B DNA, e.g. H-DNA and G4-DNA), which have been identified as endogenous sources of genomic instability. There are regions that contain overlapping sequences that may form more than one non-B DNA structure. The extent to which one structure impacts the formation/stability of another, within the sequence, is not fully understood. To address this issue, we investigated the folding preferences of oligonucleotides from a chromosomal breakpoint hotspot in the human c-MYC oncogene containing both potential G4-forming and H-DNA-forming elements. We characterized the structures formed in the presence of G4-DNA-stabilizing K+ ions or H-DNA-stabilizing Mg2+ ions using multiple techniques. We found that under conditions favorable for H-DNA formation, a stable intramolecular triplex DNA structure predominated; whereas, under K+-rich, G4-DNA-forming conditions, a plurality of unfolded and folded species were present. Thus, within a limited region containing sequences with the potential to adopt multiple structures, only one structure predominates under a given condition. The predominance of H-DNA implicates this structure in the instability associated with the human c-MYC oncogene. PMID:28334873
Monaural Sound Localization Based on Reflective Structure and Homomorphic Deconvolution
Park, Yeonseok; Choi, Anthony
2017-01-01
The asymmetric structure around the receiver provides a particular time delay for the specific incoming propagation. This paper designs a monaural sound localization system based on the reflective structure around the microphone. The reflective plates are placed to present the direction-wise time delay, which is naturally processed by convolutional operation with a sound source. The received signal is separated for estimating the dominant time delay by using homomorphic deconvolution, which utilizes the real cepstrum and inverse cepstrum sequentially to derive the propagation response’s autocorrelation. Once the localization system accurately estimates the information, the time delay model computes the corresponding reflection for localization. Because of the structure limitation, two stages of the localization process perform the estimation procedure as range and angle. The software toolchain from propagation physics and algorithm simulation realizes the optimal 3D-printed structure. The acoustic experiments in the anechoic chamber denote that 79.0% of the study range data from the isotropic signal is properly detected by the response value, and 87.5% of the specific direction data from the study range signal is properly estimated by the response time. The product of both rates shows the overall hit rate to be 69.1%. PMID:28946625
Doi, Hideyuki; Chang, Kwang-Hyeon; Nishibe, Yuichiro; Imai, Hiroyuki; Nakano, Shin-ichi
2013-01-01
The importance of analyzing the determinants of biodiversity and community composition by using multiple trophic levels is well recognized; however, relevant data are lacking. In the present study, we investigated variations in species diversity indices and community structures of the plankton taxonomic groups-zooplankton, rotifers, ciliates, and phytoplankton-under a range of local environmental factors in pond ecosystems. For each planktonic group, we estimated the species diversity index by using linear models and analyzed the community structure by using canonical correspondence analysis. We showed that the species diversity indices and community structures varied among the planktonic groups and according to local environmental factors. The observed lack of congruence among the planktonic groups may have been caused by niche competition between groups with similar trophic guilds or by weak trophic interactions. Our findings highlight the difficulty of predicting total biodiversity within a system, based upon a single taxonomic group. Thus, to conserve the biodiversity of an ecosystem, it is crucial to consider variations in species diversity indices and community structures of different taxonomic groups, under a range of local conditions.
McAllister, Christine A; Miller, Allison J
2016-07-01
Autopolyploidy, genome duplication within a single lineage, can result in multiple cytotypes within a species. Geographic distributions of cytotypes may reflect the evolutionary history of autopolyploid formation and subsequent population dynamics including stochastic (drift) and deterministic (differential selection among cytotypes) processes. Here, we used a population genomic approach to investigate whether autopolyploidy occurred once or multiple times in Andropogon gerardii, a widespread, North American grass with two predominant cytotypes. Genotyping by sequencing was used to identify single nucleotide polymorphisms (SNPs) in individuals collected from across the geographic range of A. gerardii. Two independent approaches to SNP calling were used: the reference-free UNEAK pipeline and a reference-guided approach based on the sequenced Sorghum bicolor genome. SNPs generated using these pipelines were analyzed independently with genetic distance and clustering. Analyses of the two SNP data sets showed very similar patterns of population-level clustering of A. gerardii individuals: a cluster of A. gerardii individuals from the southern Plains, a northern Plains cluster, and a western cluster. Groupings of individuals corresponded to geographic localities regardless of cytotype: 6x and 9x individuals from the same geographic area clustered together. SNPs generated using reference-guided and reference-free pipelines in A. gerardii yielded unique subsets of genomic data. Both data sets suggest that the 9x cytotype in A. gerardii likely evolved multiple times from 6x progenitors across the range of the species. Genomic approaches like GBS and diverse bioinformatics pipelines used here facilitate evolutionary analyses of complex systems with multiple ploidy levels. © 2016 Botanical Society of America.
Wyrwa, Katarzyna; Książkiewicz, Michał; Szczepaniak, Anna; Susek, Karolina; Podkowiński, Jan; Naganowska, Barbara
2016-09-01
Narrow-leafed lupin (Lupinus angustifolius L.) has recently been considered a reference genome for the Lupinus genus. In the present work, genetic and cytogenetic maps of L. angustifolius were supplemented with 30 new molecular markers representing lupin genome regions, harboring genes involved in nitrogen fixation during the symbiotic interaction of legumes and soil bacteria (Rhizobiaceae). Our studies resulted in the precise localization of bacterial artificial chromosomes (BACs) carrying sequence variants for early nodulin 40, nodulin 26, nodulin 45, aspartate aminotransferase P2, asparagine synthetase, cytosolic glutamine synthetase, and phosphoenolpyruvate carboxylase. Together with previously mapped chromosomes, the integrated L. angustifolius map encompasses 73 chromosome markers, including 5S ribosomal DNA (rDNA) and 45S rDNA, and anchors 20 L. angustifolius linkage groups to corresponding chromosomes. Chromosomal identification using BAC fluorescence in situ hybridization identified two BAC clones as narrow-leafed lupin centromere-specific markers, which served as templates for preliminary studies of centromere composition within the genus. Bioinformatic analysis of these two BACs revealed that centromeric/pericentromeric regions of narrow-leafed lupin chromosomes consisted of simple sequence repeats ordered into tandem repeats containing the trinucleotide and pentanucleotide simple sequence repeats AGG and GATAC, structured into long arrays. Moreover, cross-genus microsynteny analysis revealed syntenic patterns of 31 single-locus BAC clones among several legume species. The gene and chromosome level findings provide evidence of ancient duplication events that must have occurred very early in the divergence of papilionoid lineages. This work provides a strong foundation for future comparative mapping among legumes and may facilitate understanding of mechanisms involved in shaping legume chromosomes.
Complete convergence of randomly weighted END sequences and its application.
Li, Penghua; Li, Xiaoqin; Wu, Kehan
2017-01-01
We investigate the complete convergence of partial sums of randomly weighted extended negatively dependent (END) random variables. Some results of complete moment convergence, complete convergence and the strong law of large numbers for this dependent structure are obtained. As an application, we study the convergence of the state observers of linear-time-invariant systems. Our results extend the corresponding earlier ones.
Kitahara, Kei; Kajiura, Akimasa; Sato, Neuza Satomi; Suzuki, Tsutomu
2007-01-01
Ribosomal protein L2 is a highly conserved primary 23S rRNA-binding protein. L2 specifically recognizes the internal bulge sequence in Helix 66 (H66) of 23S rRNA and is localized to the intersubunit space through formation of bridge B7b with 16S rRNA. The L2-binding site in H66 is highly conserved in prokaryotic ribosomes, whereas the corresponding site in eukaryotic ribosomes has evolved into distinct classes of sequences. We performed a systematic genetic selection of randomized rRNA sequences in Escherichia coli, and isolated 20 functional variants of the L2-binding site. The isolated variants consisted of eukaryotic sequences, in addition to prokaryotic sequences. These results suggest that L2/L8e does not recognize a specific base sequence of H66, but rather a characteristic architecture of H66. The growth phenotype of the isolated variants correlated well with their ability of subunit association. Upon continuous cultivation of a deleterious variant, we isolated two spontaneous mutations within domain IV of 23S rRNA that compensated for its weak subunit association, and alleviated its growth defect, implying that functional interactions between intersubunit bridges compensate ribosomal function. PMID:17553838
SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction.
Boniecki, Michal J; Lach, Grzegorz; Dawson, Wayne K; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M; Bujnicki, Janusz M
2016-04-20
RNA molecules play fundamental roles in cellular processes. Their function and interactions with other biomolecules are dependent on the ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. Here, we present SimRNA: a new method for computational RNA 3D structure prediction, which uses a coarse-grained representation, relies on the Monte Carlo method for sampling the conformational space, and employs a statistical potential to approximate the energy and identify conformations that correspond to biologically relevant structures. SimRNA can fold RNA molecules using only sequence information, and, on established test sequences, it recapitulates secondary structure with high accuracy, including correct prediction of pseudoknots. For modeling of complex 3D structures, it can use additional restraints, derived from experimental or computational analyses, including information about secondary structure and/or long-range contacts. SimRNA also can be used to analyze conformational landscapes and identify potential alternative structures. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
2015-01-01
DNA oxidation by reactive oxygen species is nonrandom, potentially leading to accumulation of nucleobase damage and mutations at specific sites within the genome. We now present the first quantitative data for sequence-dependent formation of structurally defined oxidative nucleobase adducts along p53 gene-derived DNA duplexes using a novel isotope labeling-based approach. Our results reveal that local nucleobase sequence context differentially alters the yields of 2,2,4-triamino-2H-oxal-5-one (Z) and 8-oxo-7,8-dihydro-2′-deoxyguanosine (OG) in double stranded DNA. While both lesions are overproduced within endogenously methylated MeCG dinucleotides and at 5′ Gs in runs of several guanines, the formation of Z (but not OG) is strongly preferred at solvent-exposed guanine nucleobases at duplex ends. Targeted oxidation of MeCG sequences may be caused by a lowered ionization potential of guanine bases paired with MeC and the preferential intercalation of riboflavin photosensitizer adjacent to MeC:G base pairs. Importantly, some of the most frequently oxidized positions coincide with the known p53 lung cancer mutational “hotspots” at codons 245 (GGC), 248 (CGG), and 158 (CGC) respectively, supporting a possible role of oxidative degradation of DNA in the initiation of lung cancer. PMID:24571128
SvABA: genome-wide detection of structural variants and indels by local assembly.
Wala, Jeremiah A; Bandopadhayay, Pratiti; Greenwald, Noah F; O'Rourke, Ryan; Sharpe, Ted; Stewart, Chip; Schumacher, Steve; Li, Yilong; Weischenfeldt, Joachim; Yao, Xiaotong; Nusbaum, Chad; Campbell, Peter; Getz, Gad; Meyerson, Matthew; Zhang, Cheng-Zhong; Imielinski, Marcin; Beroukhim, Rameen
2018-04-01
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs. © 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press.
On microscopic structure of the QCD vacuum
NASA Astrophysics Data System (ADS)
Pak, D. G.; Lee, Bum-Hoon; Kim, Youngman; Tsukioka, Takuya; Zhang, P. M.
2018-05-01
We propose a new class of regular stationary axially symmetric solutions in a pure QCD which correspond to monopole-antimonopole pairs at macroscopic scale. The solutions represent vacuum field configurations which are locally stable against quantum gluon fluctuations in any small space-time vicinity. This implies that the monopole-antimonopole pair can serve as a structural element in microscopic description of QCD vacuum formation.
Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra
2016-01-01
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Buckling and Failure of Compression-Loaded Composite Laminated Shells With Cutouts
NASA Technical Reports Server (NTRS)
Hilburger, Mark W.
2007-01-01
Results from a numerical and experimental study that illustrate the effects of laminate orthotropy on the buckling and failure response of compression-loaded composite cylindrical shells with a cutout are presented. The effects of orthotropy on the overall response of compression-loaded shells is described. In general, preliminary numerical results appear to accurately predict the buckling and failure characteristics of the shell considered herein. In particular, some of the shells exhibit stable post-local-buckling behavior accompanied by interlaminar material failures near the free edges of the cutout. In contrast another shell with a different laminate stacking sequence appears to exhibit catastrophic interlaminar material failure at the onset of local buckling near the cutout and this behavior correlates well with corresponding experimental results.
On the phase space structure of IP3 induced Ca2+ signalling and concepts for predictive modeling
NASA Astrophysics Data System (ADS)
Falcke, Martin; Moein, Mahsa; TilÅ«naitÄ--, Agne; Thul, Rüdiger; Skupin, Alexander
2018-04-01
The correspondence between mathematical structures and experimental systems is the basis of the generalizability of results found with specific systems and is the basis of the predictive power of theoretical physics. While physicists have confidence in this correspondence, it is less recognized in cellular biophysics. On the one hand, the complex organization of cellular dynamics involving a plethora of interacting molecules and the basic observation of cell variability seem to question its possibility. The practical difficulties of deriving the equations describing cellular behaviour from first principles support these doubts. On the other hand, ignoring such a correspondence would severely limit the possibility of predictive quantitative theory in biophysics. Additionally, the existence of functional modules (like pathways) across cell types suggests also the existence of mathematical structures with comparable universality. Only a few cellular systems have been sufficiently investigated in a variety of cell types to follow up these basic questions. IP3 induced Ca2+signalling is one of them, and the mathematical structure corresponding to it is subject of ongoing discussion. We review the system's general properties observed in a variety of cell types. They are captured by a reaction diffusion system. We discuss the phase space structure of its local dynamics. The spiking regime corresponds to noisy excitability. Models focussing on different aspects can be derived starting from this phase space structure. We discuss how the initial assumptions on the set of stochastic variables and phase space structure shape the predictions of parameter dependencies of the mathematical models resulting from the derivation.
Relativistic Causality and Quasi-Orthomodular Algebras
NASA Astrophysics Data System (ADS)
Nobili, Renato
2006-05-01
The concept of fractionability or decomposability in parts of a physical system has its mathematical counterpart in the lattice--theoretic concept of orthomodularity. Systems with a finite number of degrees of freedom can be decomposed in different ways, corresponding to different groupings of the degrees of freedom. The orthomodular structure of these simple systems is trivially manifest. The problem then arises as to whether the same property is shared by physical systems with an infinite number of degrees of freedom, in particular by the quantum relativistic ones. The latter case was approached several years ago by Haag and Schroer (1962; Haag, 1992) who started from noting that the causally complete sets of Minkowski spacetime form an orthomodular lattice and posed the question of whether the subalgebras of local observables, with topological supports on such subsets, form themselves a corresponding orthomodular lattice. Were it so, the way would be paved to interpreting spacetime as an intrinsic property of a local quantum field algebra. Surprisingly enough, however, the hoped property does not hold for local algebras of free fields with superselection rules. The possibility seems to be instead open if the local currents that govern the superselection rules are driven by gauge fields. Thus, in the framework of local quantum physics, the request for algebraic orthomodularity seems to imply physical interactions! Despite its charm, however, such a request appears plagued by ambiguities and criticities that make of it an ill--posed problem. The proposers themselves, indeed, concluded that the orthomodular correspondence hypothesis is too strong for having a chance of being practicable. Thus, neither the idea was taken seriously by the proposers nor further investigated by others up to a reasonable degree of clarification. This paper is an attempt to re--formulate and well--pose the problem. It will be shown that the idea is viable provided that the algebra of local observables: (1) is considered all over the whole range of its irreducible representations; (2) is widened with the addition of the elements of a suitable intertwining group of automorphisms; (3) the orthomodular correspondence requirement is modified to an extent sufficient to impart a natural topological structure to the intertwined algebra of observables so obtained. A novel scenario then emerges in which local quantum physics appears to provide a general framework for non--perturbative quantum field dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geraghty, M.T.; Stetten, G.; Kearns, W.
1994-09-01
X-linked adrenoleukodystrophy (ALD) is a disorder of peroxisomal {beta}-oxidation of very long chain fatty acids. It presents either as progressive dementia in childhood or as progressive paraparesis in later years. Adrenal insufficiency occurs in both phenotypes. The gene of the ALD protein has been mapped to Xq28 and has recently been cloned and characterized. The ALD protein has significant homology to the peroxisomal membrane protein, PMP70 and belongs to the ATP binding cassette superfamily of transporters. We screened a human genomic library with an ALDP cDNA and isolated 5 different but highly similar clones containing sequences corresponding to the 3{prime}more » end of the ALDP gene. Comparison of the sequences over the region corresponding to exon 9 through the 3{prime} end of the ALDP gene reveals {approximately}96% nucleotide identity in both exonic and intronic regions. Splice sites and open reading frames are maintained. Using both FISH and human-rodent DNA mapping panels, we positively assign these ALDP-related sequences to chromosomes 2, 16 and 22, and provisionally to 1 and 20. Southern blot of primate DNA probed with a partial ALDP cDNA (exon 2-10) shows that expansion of ALDP-related sequences occurred in higher primates (chimp, gorilla and human). Although Northern blots show multiple ALDP-hybridizing transcripts in certain tissues, we have no evidence to date for expression of these ALDP-related sequences. In conclusion, our data show there has been an unusual and recent dispersal to multiple chromosomes of structural gene sequences related to the ALDP gene. The functional significance of these sequences remains to be determined but their existence complicates PCR and mutation analysis of the ALDP gene.« less
ERIC Educational Resources Information Center
Dawson, Colin; Gerken, LouAnn
2011-01-01
While many constraints on learning must be relatively experience-independent, past experience provides a rich source of guidance for subsequent learning. Discovering structure in some domain can inform a learner's future hypotheses about that domain. If a general property accounts for particular sub-patterns, a rational learner should not…
Mapping the acquisition of the number word sequence in the first year of school
NASA Astrophysics Data System (ADS)
Gould, Peter
2017-03-01
Learning to count and to produce the correct sequence of number words in English is not a simple process. In NSW government schools taking part in Early Action for Success, over 800 students in each of the first 3 years of school were assessed every 5 weeks over the school year to determine the highest correct oral count they could produce. Rather than displaying a steady increase in the accurate sequence of the number words produced, the kindergarten data reported here identified clear, substantial hurdles in the acquisition of the counting sequence. The large-scale, longitudinal data also provided evidence of learning to count through the teens being facilitated by the semi-regular structure of the number words in English. Instead of occurring as hurdles to starting the next counting sequence, number words corresponding to some multiples of ten (10, 20 and 100) acted as if they were rest points. These rest points appear to be artefacts of how the counting sequence is acquired.
Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J
1989-12-21
The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.
Beamer, B A; Negri, C; Yen, C J; Gavrilova, O; Rumberger, J M; Durcan, M J; Yarnall, D P; Hawkins, A L; Griffin, C A; Burns, D K; Roth, J; Reitman, M; Shuldiner, A R
1997-04-28
We determined the chromosomal localization and partial genomic structure of the coding region of the human PPAR gamma gene (hPPAR gamma), a nuclear receptor important for adipocyte differentiation and function. Sequence analysis and long PCR of human genomic DNA with primers that span putative introns revealed that intron positions and sizes of hPPAR gamma are similar to those previously determined for the mouse PPAR gamma gene[13]. Fluorescent in situ hybridization localized hPPAR gamma to chromosome 3, band 3p25. Radiation hybrid mapping with two independent primer pairs was consistent with hPPAR gamma being within 1.5 Mb of marker D3S1263 on 3p25-p24.2. These sequences of the intron/exon junctions of the 6 coding exons shared by hPPAR gamma 1 and hPPAR gamma 2 will facilitate screening for possible mutations. Furthermore, D3S1263 is a suitable polymorphic marker for linkage analysis to evaluate PPAR gamma's potential contribution to genetic susceptibility to obesity, lipoatrophy, insulin resistance, and diabetes.
On Cognition, Structured Sequence Processing, and Adaptive Dynamical Systems
NASA Astrophysics Data System (ADS)
Petersson, Karl Magnus
2008-11-01
Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
Batista-García, Ramón Alberto; Sánchez-Reyes, Ayixon; Millán-Pacheco, César; González-Zuñiga, Víctor Manuel; Juárez, Soledad; Folch-Mallol, Jorge Luis; Pastor, Nina
2014-09-01
We isolated a putative citrate transporter of the tripartite tricarboxylate transporter (TTT) class from a metagenomic library of activated sludge from a sewage treatment plant. The transporter, dubbed TctA_ar, shares ∼50% sequence identity with TctA of Comamonas testosteroni (TctA_ct) and other β-Proteobacteria, and contains two 20-amino acid repeat signature sequences, considered a hallmark of this particular transporter class. The structures for both TctA_ar and TctA_ct were modeled with I-TASSER and two possible structures for this transporter family were proposed. Docking assays with citrate resulted in the corresponding sets of proposed critical residues for function. These models suggest functions for the 20-amino acid repeats in the context of the two different architectures. This constitutes the first attempt at structure modeling of the TTT family, to the best of our knowledge, and could aid functional understanding of this little-studied family. © 2014 Wiley Periodicals, Inc.
De Oliveira, T; Miller, R; Tarin, M; Cassol, S
2003-01-01
Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net).
Deiana, Antonio; Giansanti, Andrea
2010-04-21
Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.
2010-01-01
Background Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. Results In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Conclusions Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated. PMID:20409339
Crystal structure of bacillus subtilis YdaF protein : a putative ribosomal N-acetyltransferase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunzelle, J. S.; Wu, R.; Korolev, S. V.
2004-12-01
Comparative sequence analysis suggests that the ydaF gene encodes a protein (YdaF) that functions as an N-acetyltransferase, more specifically, a ribosomal N-acetyltransferase. Sequence analysis using basic local alignment search tool (BLAST) suggests that YdaF belongs to a large family of proteins (199 proteins found in 88 unique species of bacteria, archaea, and eukaryotes). YdaF also belongs to the COG1670, which includes the Escherichia coli RimL protein that is known to acetylate ribosomal protein L12. N-acetylation (NAT) has been found in all kingdoms. NAT enzymes catalyze the transfer of an acetyl group from acetyl-CoA (AcCoA) to a primary amino group. Formore » example, NATs can acetylate the N-terminal {alpha}-amino group, the {epsilon}-amino group of lysine residues, aminoglycoside antibiotics, spermine/speridine, or arylalkylamines such as serotonin. The crystal structure of the alleged ribosomal NAT protein, YdaF, from Bacillus subtilis presented here was determined as a part of the Midwest Center for Structural Genomics. The structure maintains the conserved tertiary structure of other known NATs and a high sequence similarity in the presumed AcCoA binding pocket in spite of a very low overall level of sequence identity to other NATs of known structure.« less
Comparative modeling without implicit sequence alignments.
Kolinski, Andrzej; Gront, Dominik
2007-10-01
The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.
Crystal structure of solid molecular hydrogen under high pressures
NASA Astrophysics Data System (ADS)
Cui, T.; Ma, Y.; Zou, G.
2002-11-01
In an effort to achieve a comprehensive understanding of the structure of dense H2, we have performed path-integral Monte Carlo simulations for three combinations of pressures and temperatures corresponding to three phases of solid hydrogen. Our results suggest three kinds of distribution of molecules: orientationally disordered hexagonal close packed (hcp), orientationally ordered hcp with Pa3-type local orientation order and orientationally ordered orthorhombic structure of Cmca symmetry, for the three phases.
NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.
Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua
2013-01-01
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
PredictProtein—an open resource for online prediction of protein structural and functional features
Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard
2014-01-01
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
Chernin, L S; De la Fuente, L; Sobolev, V; Haran, S; Vorgias, C E; Oppenheim, A B; Chet, I
1997-01-01
The gene chiA, which codes for endochitinase, was cloned from a soilborne Enterobacter agglomerans. Its complete sequence was determined, and the deduced amino acid sequence of the enzyme designated Chia_Entag yielded an open reading frame coding for 562 amino acids of a 61-kDa precursor protein with a putative leader peptide at its N terminus. The nucleotide and polypeptide sequences of Chia_Entag showed 86.8 and 87.7% identity with the corresponding gene and enzyme, Chia_Serma, of Serratia marcescens, respectively. Homology modeling of Chia_Entag's three-dimensional structure demonstrated that most amino acid substitutions are at solvent-accessible sites. Escherichia coli JM109 carrying the E. agglomerans chiA gene produced and secreted Chia_Entag. The antifungal activity of the secreted endochitinase was demonstrated in vitro by inhibition of Fusarium oxysporum spore germination. The transformed strain inhibited Rhizoctonia solani growth on plates and the root rot disease caused by this fungus in cotton seedlings under greenhouse conditions. PMID:9055404
Savary, Brett J; Vasu, Prasanna; Cameron, Randall G; McCollum, T Gregory; Nuñez, Alberto
2013-12-26
Despite the longstanding importance of the thermally tolerant pectin methylesterase (TT-PME) activity in citrus juice processing and product quality, the unequivocal identification of the protein and its corresponding gene has remained elusive. TT-PME was purified from sweet orange [ Citrus sinensis (L.) Osbeck] finisher pulp (8.0 mg/1.3 kg tissue) with an improved purification scheme that provided 20-fold increased enzyme yield over previous results. Structural characterization of electrophoretically pure TT-PME by MALDI-TOF MS determined molecular masses of approximately 47900 and 53000 Da for two principal glycoisoforms. De novo sequences generated from tryptic peptides by MALDI-TOF/TOF MS matched multiple anonymous Citrus EST cDNA accessions. The complete tt-pme cDNA (1710 base pair) was cloned from a fruit mRNA library using RT- and RLM-RACE PCR. Citrus TT-PME is a novel isoform that showed higher sequence identity with the multiply glycosylated kiwifruit PME than to previously described Citrus thermally labile PME isoforms.
Percolation in random-Sierpiński carpets: A real space renormalization group approach
NASA Astrophysics Data System (ADS)
Perreau, Michel; Peiro, Joaquina; Berthier, Serge
1996-11-01
The site percolation transition in random Sierpiński carpets is investigated by real space renormalization. The fixed point is not unique like in regular translationally invariant lattices, but depends on the number k of segmentation steps of the generation process of the fractal. It is shown that, for each scale invariance ratio n, the sequence of fixed points pn,k is increasing with k, and converges when k-->∞ toward a limit pn strictly less than 1. Moreover, in such scale invariant structures, the percolation threshold does not depend only on the scale invariance ratio n, but also on the scale. The sequence pn,k and pn are calculated for n=4, 8, 16, 32, and 64, and for k=1 to k=11, and k=∞. The corresponding thermal exponent sequence νn,k is calculated for n=8 and 16, and for k=1 to k=5, and k=∞. Suggestions are made for an experimental test in physical self-similar structures.
Mosaic organization of DNA nucleotides
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.
1994-01-01
Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.
How many hydrogen-bonded α-turns are possible?
Schreiber, Anette; Schramm, Peter; Hofmann, Hans-Jörg
2011-06-01
The formation of α-turns is a possibility to reverse the direction of peptide sequences via five amino acids. In this paper, a systematic conformational analysis was performed to find the possible isolated α-turns with a hydrogen bond between the first and fifth amino acid employing the methods of ab initio MO theory in vacuum (HF/6-31G*, B3LYP/6-311 + G*) and in solution (CPCM/HF/6-31G*). Only few α-turn structures with glycine and alanine backbones fulfill the geometry criteria for the i←(i + 4) hydrogen bond satisfactorily. The most stable representatives agree with structures found in the Protein Data Bank. There is a general tendency to form additional hydrogen bonds for smaller pseudocycles corresponding to β- and γ-turns with better hydrogen bond geometries. Sometimes, this competition weakens or even destroys the i←(i + 4) hydrogen bond leading to very stable double β-turn structures. This is also the reason why an "ideal" α-turn with three central amino acids having the perfect backbone angle values of an α-helix could not be localized. There are numerous hints for stable α-turns with a distance between the C(α)-atoms of the first and fifth amino acid smaller than 6-7 Å, but without an i←(i + 4) hydrogen bond.
Piatkowski, Pawel; Kasprzak, Joanna M; Kumar, Deepak; Magnus, Marcin; Chojnowski, Grzegorz; Bujnicki, Janusz M
2016-01-01
RNA encompasses an essential part of all known forms of life. The functions of many RNA molecules are dependent on their ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. To address this problem, computational structure prediction methods were developed that either utilize information derived from known structures of other RNA molecules (by way of template-based modeling) or attempt to simulate the physical process of RNA structure formation (by way of template-free modeling). All computational methods suffer from various limitations that make theoretical models less reliable than high-resolution experimentally determined structures. This chapter provides a protocol for computational modeling of RNA 3D structure that overcomes major limitations by combining two complementary approaches: template-based modeling that is capable of predicting global architectures based on similarity to other molecules but often fails to predict local unique features, and template-free modeling that can predict the local folding, but is limited to modeling the structure of relatively small molecules. Here, we combine the use of a template-based method ModeRNA with a template-free method SimRNA. ModeRNA requires a sequence alignment of the target RNA sequence to be modeled with a template of the known structure; it generates a model that predicts the structure of a conserved core and provides a starting point for modeling of variable regions. SimRNA can be used to fold small RNAs (<80 nt) without any additional structural information, and to refold parts of models for larger RNAs that have a correctly modeled core. ModeRNA can be either downloaded, compiled and run locally or run through a web interface at http://genesilico.pl/modernaserver/ . SimRNA is currently available to download for local use as a precompiled software package at http://genesilico.pl/software/stand-alone/simrna and as a web server at http://genesilico.pl/SimRNAweb . For model optimization we use QRNAS, available at http://genesilico.pl/qrnas .
Lyapunov exponents for one-dimensional aperiodic photonic bandgap structures
NASA Astrophysics Data System (ADS)
Kissel, Glen J.
2011-10-01
Existing in the "gray area" between perfectly periodic and purely randomized photonic bandgap structures are the socalled aperoidic structures whose layers are chosen according to some deterministic rule. We consider here a onedimensional photonic bandgap structure, a quarter-wave stack, with the layer thickness of one of the bilayers subject to being either thin or thick according to five deterministic sequence rules and binary random selection. To produce these aperiodic structures we examine the following sequences: Fibonacci, Thue-Morse, Period doubling, Rudin-Shapiro, as well as the triadic Cantor sequence. We model these structures numerically with a long chain (approximately 5,000,000) of transfer matrices, and then use the reliable algorithm of Wolf to calculate the (upper) Lyapunov exponent for the long product of matrices. The Lyapunov exponent is the statistically well-behaved variable used to characterize the Anderson localization effect (exponential confinement) when the layers are randomized, so its calculation allows us to more precisely compare the purely randomized structure with its aperiodic counterparts. It is found that the aperiodic photonic systems show much fine structure in their Lyapunov exponents as a function of frequency, and, in a number of cases, the exponents are quite obviously fractal.
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Yang, A S; Hitz, B; Honig, B
1996-06-21
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
Variations in Nuclear Localization Strategies Among Pol X Family Enzymes.
Kirby, Thomas W; Pedersen, Lars C; Gabel, Scott A; Gassman, Natalie R; London, Robert E
2018-06-22
Despite the essential roles of pol X family enzymes in DNA repair, information about the structural basis of their nuclear import is limited. Recent studies revealed the unexpected presence of a functional NLS in DNA polymerase β, indicating the importance of active nuclear targeting, even for enzymes likely to leak into and out of the nucleus. The current studies further explore the active nuclear transport of these enzymes by identifying and structurally characterizing the functional NLS sequences in the three remaining human pol X enzymes: terminal deoxynucleotidyl transferase (TdT), DNA polymerase μ (pol μ), and DNA polymerase λ (pol λ). NLS identifications are based on Importin α (Impα) binding affinity determined by fluorescence polarization of fluorescein-labeled NLS peptides, X-ray crystallographic analysis of the Impα∆IBB•NLS complexes, and fluorescence-based subcellular localization studies. All three polymerases use NLS sequences located near their N-terminus; TdT and pol μ utilize monopartite NLS sequences, while pol λ utilizes a bipartite sequence, unique among the pol X family members. The pol μ NLS has relatively weak measured affinity for Impα, due in part to its proximity to the N-terminus that limits non-specific interactions of flanking residues preceding the NLS. However, this effect is partially mitigated by an N-terminal sequence unsupportive of Met1 removal by methionine aminopeptidase, leading to a 3-fold increase in affinity when the N-terminal methionine is present. Nuclear targeting is unique to each pol X family enzyme with variations dependent on the structure and unique functional role of each polymerase. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Hierarchic models for laminated plates. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Actis, Ricardo Luis
1991-01-01
Structural plates and shells are three-dimensional bodies, one dimension of which happens to be much smaller than the other two. Thus, the quality of a plate or shell model must be judged on the basis of how well its exact solution approximates the corresponding three-dimensional problem. Of course, the exact solution depends not only on the choice of the model but also on the topology, material properties, loading and constraints. The desired degree of approximation depends on the analyst's goals in performing the analysis. For these reasons models have to be chosen adaptively. Hierarchic sequences of models make adaptive selection of the model which is best suited for the purposes of a particular analysis possible. The principles governing the formulation of hierarchic models for laminated plates are presented. The essential features of the hierarchic models described models are: (1) the exact solutions corresponding to the hierarchic sequence of models converge to the exact solution of the corresponding problem of elasticity for a fixed laminate thickness; and (2) the exact solution of each model converges to the same limit as the exact solution of the corresponding problem of elasticity with respect to the laminate thickness approaching zero. The formulation is based on one parameter (beta) which characterizes the hierarchic sequence of models, and a set of constants whose influence was assessed by a numerical sensitivity study. The recommended selection of these constants results in the number of fields increasing by three for each increment in the power of beta. Numerical examples analyzed with the proposed sequence of models are included and good correlation with the reference solutions was found. Results were obtained for laminated strips (plates in cylindrical bending) and for square and rectangular plates with uniform loading and with homogeneous boundary conditions. Cross-ply and angle-ply laminates were evaluated and the results compared with those of MSC/PROBE. Hierarchic models make the computation of any engineering data possible to an arbitrary level of precision within the framework of the theory of elasticity.
Smoothing of cost function leads to faster convergence of neural network learning
NASA Astrophysics Data System (ADS)
Xu, Li-Qun; Hall, Trevor J.
1994-03-01
One of the major problems in supervised learning of neural networks is the inevitable local minima inherent in the cost function f(W,D). This often makes classic gradient-descent-based learning algorithms that calculate the weight updates for each iteration according to (Delta) W(t) equals -(eta) (DOT)$DELwf(W,D) powerless. In this paper we describe a new strategy to solve this problem, which, adaptively, changes the learning rate and manipulates the gradient estimator simultaneously. The idea is to implicitly convert the local- minima-laden cost function f((DOT)) into a sequence of its smoothed versions {f(beta t)}Ttequals1, which, subject to the parameter (beta) t, bears less details at time t equals 1 and gradually more later on, the learning is actually performed on this sequence of functionals. The corresponding smoothed global minima obtained in this way, {Wt}Ttequals1, thus progressively approximate W-the desired global minimum. Experimental results on a nonconvex function minimization problem and a typical neural network learning task are given, analyses and discussions of some important issues are provided.
Multi-scale symbolic transfer entropy analysis of EEG
NASA Astrophysics Data System (ADS)
Yao, Wenpo; Wang, Jun
2017-10-01
From both global and local perspectives, we symbolize two kinds of EEG and analyze their dynamic and asymmetrical information using multi-scale transfer entropy. Multi-scale process with scale factor from 1 to 199 and step size of 2 is applied to EEG of healthy people and epileptic patients, and then the permutation with embedding dimension of 3 and global approach are used to symbolize the sequences. The forward and reverse symbol sequences are taken as the inputs of transfer entropy. Scale factor intervals of permutation and global way are (37, 57) and (65, 85) where the two kinds of EEG have satisfied entropy distinctions. When scale factor is 67, transfer entropy of the healthy and epileptic subjects of permutation, 0.1137 and 0.1028, have biggest difference. And the corresponding values of the global symbolization is 0.0641 and 0.0601 which lies in the scale factor of 165. Research results show that permutation which takes contribution of local information has better distinction and is more effectively applied to our multi-scale transfer entropy analysis of EEG.
Li de La Sierra-Gallay, Ines; Collinet, Bruno; Graille, Marc; Quevillon-Cheruel, Sophie; Liger, Dominique; Minard, Philippe; Blondeau, Karine; Henckes, Gilles; Aufrère, Robert; Leulliot, Nicolas; Zhou, Cong-Zhao; Sorel, Isabelle; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman
2004-03-01
The protein product of the YGR205w gene of Saccharomyces cerevisiae was targeted as part of our yeast structural genomics project. YGR205w codes for a small (290 amino acids) protein with unknown structure and function. The only recognizable sequence feature is the presence of a Walker A motif (P loop) indicating a possible nucleotide binding/converting function. We determined the three-dimensional crystal structure of Se-methionine substituted protein using multiple anomalous diffraction. The structure revealed a well known mononucleotide fold and strong resemblance to the structure of small metabolite phosphorylating enzymes such as pantothenate and phosphoribulo kinase. Biochemical experiments show that YGR205w binds specifically ATP and, less tightly, ADP. The structure also revealed the presence of two bound sulphate ions, occupying opposite niches in a canyon that corresponds to the active site of the protein. One sulphate is bound to the P-loop in a position that corresponds to the position of beta-phosphate in mononucleotide protein ATP complex, suggesting the protein is indeed a kinase. The nature of the phosphate accepting substrate remains to be determined. Copyright 2004 Wiley-Liss, Inc.
Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J
1988-01-01
In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125
Entropic fluctuations in DNA sequences
NASA Astrophysics Data System (ADS)
Thanos, Dimitrios; Li, Wentian; Provata, Astero
2018-03-01
The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.
Stadelmann, Marc A; Maquer, Ghislain; Voumard, Benjamin; Grant, Aaron; Hackney, David B; Vermathen, Peter; Alkalay, Ron N; Zysset, Philippe K
2018-05-17
Intervertebral disc degeneration is a common disease that is often related to impaired mechanical function, herniations and chronic back pain. The degenerative process induces alterations of the disc's shape, composition and structure that can be visualized in vivo using magnetic resonance imaging (MRI). Numerical tools such as finite element analysis (FEA) have the potential to relate MRI-based information to the altered mechanical behavior of the disc. However, in terms of geometry, composition and fiber architecture, current FE models rely on observations made on healthy discs and might therefore not be well suited to study the degeneration process. To address the issue, we propose a new, more realistic FE methodology based on diffusion tensor imaging (DTI). For this study, a human disc joint was imaged in a high-field MR scanner with proton-density weighted (PD) and DTI sequences. The PD image was segmented and an anatomy-specific mesh was generated. Assuming accordance between local principal diffusion direction and local mean collagen fiber alignment, corresponding fiber angles were assigned to each element. Those element-wise fiber directions and PD intensities allowed the homogenized model to smoothly account for composition and fibrous structure of the disc. The disc's in vitro mechanical behavior was quantified under tension, compression, flexion, extension, lateral bending and rotation. The six resulting load-displacement curves could be replicated by the FE model, which supports our approach as a first proof of concept towards patient-specific disc modeling. Copyright © 2018 Elsevier Ltd. All rights reserved.