Guiding Conformation Space Search with an All-Atom Energy Potential
Brunette, TJ; Brock, Oliver
2009-01-01
The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration towards regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy. PMID:18536015
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Simoncini, David; Schiex, Thomas; Zhang, Kam Y J
2017-05-01
Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRose c ) and an energy-based one (EdaRose en ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRose c is able to provide predictions with lower C αRMSD to the native structure than EdaRose en and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
RAG-3D: A search tool for RNA 3D substructures
Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; ...
2015-08-24
In this study, to address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally describedmore » in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.« less
RAG-3D: a search tool for RNA 3D substructures
Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; Schlick, Tamar
2015-01-01
To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding. PMID:26304547
RAG-3D: A search tool for RNA 3D substructures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef
In this study, to address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally describedmore » in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.« less
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Burliaeva, E V; Tarkhov, A E; Burliaev, V V; Iurkevich, A M; Shvets, V I
2002-01-01
Searching of new anti-HIV agents is still crucial now. In general, researches are looking for inhibitors of certain HIV's vital enzymes, especially for reverse transcriptase (RT) inhibitors. Modern generation of anti-HIV agents represents non-nucleoside reverse transcriptase inhibitors (NNRTIs). They are much less toxic than nucleoside analogues and more chemically stable, thus being slower metabolized and emitted from the human body. Thus, search of new NNRTIs is actual today. Synthesis and study of new anti-HIV drugs is very expensive. So employment of the activity prediction techniques for such a search is very beneficial. This technique allows predicting the activities for newly proposed structures. It is based on the property model built by investigation of a series of known compounds with measured activity. This paper presents an approach of activity prediction based on "structure-activity" models designed to form a hypothesis about probably activity interval estimate. This hypothesis formed is based on structure descriptor domains, calculated for all energetically allowed conformers for each compound in the studied sef. Tetrahydroimidazobenzodiazipenone (TIBO) derivatives and phenylethyltiazolyltiourea (PETT) derivatives illustrated the predictive power of this method. The results are consistent with experimental data and allow to predict inhibitory activity of compounds, which were not included into the training set.
Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A
2005-05-24
Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.
A predictive framework for evaluating models of semantic organization in free recall
Morton, Neal W; Polyn, Sean M.
2016-01-01
Research in free recall has demonstrated that semantic associations reliably influence the organization of search through episodic memory. However, the specific structure of these associations and the mechanisms by which they influence memory search remain unclear. We introduce a likelihood-based model-comparison technique, which embeds a model of semantic structure within the context maintenance and retrieval (CMR) model of human memory search. Within this framework, model variants are evaluated in terms of their ability to predict the specific sequence in which items are recalled. We compare three models of semantic structure, latent semantic analysis (LSA), global vectors (GloVe), and word association spaces (WAS), and find that models using WAS have the greatest predictive power. Furthermore, we find evidence that semantic and temporal organization is driven by distinct item and context cues, rather than a single context cue. This finding provides important constraint for theories of memory search. PMID:28331243
Ab initio structure prediction of silicon and germanium sulfides for lithium-ion battery materials
NASA Astrophysics Data System (ADS)
Hsueh, Connie; Mayo, Martin; Morris, Andrew J.
Conventional experimental-based approaches to materials discovery, which can rely heavily on trial and error, are time-intensive and costly. We discuss approaches to coupling experimental and computational techniques in order to systematize, automate, and accelerate the process of materials discovery, which is of particular relevance to developing new battery materials. We use the ab initio random structure searching (AIRSS) method to conduct a systematic investigation of Si-S and Ge-S binary compounds in order to search for novel materials for lithium-ion battery (LIB) anodes. AIRSS is a high-throughput, density functional theory-based approach to structure prediction which has been successful at predicting the structures of LIBs containing sulfur and silicon and germanium. We propose a lithiation mechanism for Li-GeS2 anodes as well as report new, theoretically stable, layered and porous structures in the Si-S and Ge-S systems that pique experimental interest.
3D Protein structure prediction with genetic tabu search algorithm
2010-01-01
Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Alvarez, George A.; Nakayama, Ken; Konkle, Talia
2016-01-01
Visual search is a ubiquitous visual behavior, and efficient search is essential for survival. Different cognitive models have explained the speed and accuracy of search based either on the dynamics of attention or on similarity of item representations. Here, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Participants performed a visual search task with 28 conditions reflecting different pairs of categories (e.g., searching for a face among cars, body among hammers, etc.). The time it took participants to find the target item varied as a function of category combination. In a separate group of participants, we measured the neural responses to these object categories when items were presented in isolation. Using representational similarity analysis, we then examined whether the similarity of neural responses across different subdivisions of the visual system had the requisite structure needed to predict visual search performance. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macroscale sectors as well as smaller mesoscale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system. NEW & NOTEWORTHY Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions, we found that they all had highly similar category-level representational structures. These results point to a ubiquitous and uniform representational structure in high-level visual cortex underlying visual object processing. PMID:27832600
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Analysis of Free Modeling Predictions by RBO Aleph in CASP11
Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver
2015-01-01
The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue–residue contact prediction by EPC-map and contact–guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. PMID:26492194
GeneBee-net: Internet-based server for analyzing biopolymers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.
This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less
Akula, Nagaraju; Pattabiraman, Nagarajan
2005-06-01
Membrane proteins play a major role in number of biological processes such as signaling pathways. The determination of the three-dimensional structure of these proteins is increasingly important for our understanding of their structure-function relationships. Due to the difficulty in isolating membrane proteins for X-ray diffraction studies, computational techniques are being developed to generate the 3D structures of TM domains. Here, we present a systematic search method for the identification of energetically favorable and tightly packed transmembrane parallel alpha-helices. The first step in our systematic search method is the generation of 3D models for pairs of parallel helix bundles with all possible orientations followed by an energy-based filter to eliminate structures with severe non-bonded contacts. Then, a RMS-based filter was used to cluster these structures into families. Furthermore, these dimers were energy minimized using molecular mechanics force field. Finally, we identified the tightly packed parallel alpha-helices by using an interface surface area. To validate our search method, we compared our predicted GlycophorinA dimer structures with the reported NMR structures. With our search method, we are able to reproduce NMR structures of GPA with 0.9A RMSD. In addition, by considering the reported mutational data on GxxxG motif interactions, twenty percent of our predicted dimers are within in the 2.0A RMSD. The dimers obtained from our method were used to generate parallel trimeric and tetramer TM structures of GPA and found that the structure of GPA might exist only in a dimer form as reported earlier.
NASA Astrophysics Data System (ADS)
Khatibinia, M.; Salajegheh, E.; Salajegheh, J.; Fadaee, M. J.
2013-10-01
A new discrete gravitational search algorithm (DGSA) and a metamodelling framework are introduced for reliability-based design optimization (RBDO) of reinforced concrete structures. The RBDO of structures with soil-structure interaction (SSI) effects is investigated in accordance with performance-based design. The proposed DGSA is based on the standard gravitational search algorithm (GSA) to optimize the structural cost under deterministic and probabilistic constraints. The Monte-Carlo simulation (MCS) method is considered as the most reliable method for estimating the probabilities of reliability. In order to reduce the computational time of MCS, the proposed metamodelling framework is employed to predict the responses of the SSI system in the RBDO procedure. The metamodel consists of a weighted least squares support vector machine (WLS-SVM) and a wavelet kernel function, which is called WWLS-SVM. Numerical results demonstrate the efficiency and computational advantages of DGSA and the proposed metamodel for RBDO of reinforced concrete structures.
A protein-dependent side-chain rotamer library.
Bhuyan, Md Shariful Islam; Gao, Xin
2011-12-14
Protein side-chain packing problem has remained one of the key open problems in bioinformatics. The three main components of protein side-chain prediction methods are a rotamer library, an energy function and a search algorithm. Rotamer libraries summarize the existing knowledge of the experimentally determined structures quantitatively. Depending on how much contextual information is encoded, there are backbone-independent rotamer libraries and backbone-dependent rotamer libraries. Backbone-independent libraries only encode sequential information, whereas backbone-dependent libraries encode both sequential and locally structural information. However, side-chain conformations are determined by spatially local information, rather than sequentially local information. Since in the side-chain prediction problem, the backbone structure is given, spatially local information should ideally be encoded into the rotamer libraries. In this paper, we propose a new type of backbone-dependent rotamer library, which encodes structural information of all the spatially neighboring residues. We call it protein-dependent rotamer libraries. Given any rotamer library and a protein backbone structure, we first model the protein structure as a Markov random field. Then the marginal distributions are estimated by the inference algorithms, without doing global optimization or search. The rotamers from the given library are then re-ranked and associated with the updated probabilities. Experimental results demonstrate that the proposed protein-dependent libraries significantly outperform the widely used backbone-dependent libraries in terms of the side-chain prediction accuracy and the rotamer ranking ability. Furthermore, without global optimization/search, the side-chain prediction power of the protein-dependent library is still comparable to the global-search-based side-chain prediction methods.
Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-01-01
Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259
Real-time ligand binding pocket database search using local surface descriptors.
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-07-01
Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.
Improved hybrid optimization algorithm for 3D protein structure prediction.
Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang
2014-07-01
A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
From molecule to solid: The prediction of organic crystal structures
NASA Astrophysics Data System (ADS)
Dzyabchenko, A. V.
2008-10-01
A method for predicting the structure of a molecular crystal based on the systematic search for a global potential energy minimum is considered. The method takes into account unequal occurrences of the structural classes of organic crystals and symmetry of the multidimensional configuration space. The programs of global minimization PMC, comparison of crystal structures CRYCOM, and approximation to the distributions of the electrostatic potentials of molecules FitMEP are presented as tools for numerically solving the problem. Examples of predicted structures substantiated experimentally and the experience of author’s participation in international tests of crystal structure prediction organized by the Cambridge Crystallographic Data Center (Cambridge, UK) are considered.
Analysis of free modeling predictions by RBO aleph in CASP11.
Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver
2016-09-01
The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue-residue contact prediction by EPC-map and contact-guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. Proteins 2016; 84(Suppl 1):87-104. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Chen, Xiang; He, Si-Min; Bu, Dongbo; Zhang, Fa; Wang, Zhiyong; Chen, Runsheng; Gao, Wen
2008-09-15
RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is proved to be NP-hard. Due to kinetic reasons the real RNA secondary structure often has local instead of global minimum free energy. This implies that we may improve the performance of RNA secondary structure prediction by taking kinetics into account and minimize free energy in a local area. we propose a novel algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Still based on MFE criterion, FlexStem adopts comprehensive energy models that allow complex pseudoknots. Unlike classical thermodynamic methods, our approach aims to simulate the RNA folding process by successive addition of maximal stems, reducing the search space while maintaining or even improving the prediction accuracy. This reduced space is constructed by our maximal stem strategy and stem-adding rule induced from elaborate statistical experiments on real RNA secondary structures. The strategy and the rule also reflect the folding characteristic of RNA from a new angle and help compensate for the deficiency of merely relying on MFE in RNA structure prediction. We validate FlexStem by applying it to tRNAs, 5SrRNAs and a large number of pseudoknotted structures and compare it with the well-known algorithms such as RNAfold, PKNOTS, PknotsRG, HotKnots and ILM according to their overall sensitivities and specificities, as well as positive and negative controls on pseudoknots. The results show that FlexStem significantly increases the prediction accuracy through its local search strategy. Software is available at http://pfind.ict.ac.cn/FlexStem/. Supplementary data are available at Bioinformatics online.
Pattern search in multi-structure data: a framework for the next-generation evidence-based medicine
NASA Astrophysics Data System (ADS)
Sukumar, Sreenivas R.; Ainsworth, Keela C.
2014-03-01
With the impetus towards personalized and evidence-based medicine, the need for a framework to analyze/interpret quantitative measurements (blood work, toxicology, etc.) with qualitative descriptions (specialist reports after reading images, bio-medical knowledgebase, etc.) to predict diagnostic risks is fast emerging. Addressing this need, we pose and answer the following questions: (i) How can we jointly analyze and explore measurement data in context with qualitative domain knowledge? (ii) How can we search and hypothesize patterns (not known apriori) from such multi-structure data? (iii) How can we build predictive models by integrating weakly-associated multi-relational multi-structure data? We propose a framework towards answering these questions. We describe a software solution that leverages hardware for scalable in-memory analytics and applies next-generation semantic query tools on medical data.
Phase diagram of germanium telluride encapsulated in carbon nanotubes from first-principles searches
NASA Astrophysics Data System (ADS)
Wynn, Jamie M.; Medeiros, Paulo V. C.; Vasylenko, Andrij; Sloan, Jeremy; Quigley, David; Morris, Andrew J.
2017-12-01
Germanium telluride has attracted great research interest, primarily because of its phase-change properties. We have developed a general scheme, based on the ab initio random structure searching (AIRSS) method, for predicting the structures of encapsulated nanowires, and using this we predict a number of thermodynamically stable structures of GeTe nanowires encapsulated inside carbon nanotubes of radii under 9 Å . We construct the phase diagram of encapsulated GeTe, which provides quantitative predictions about the energetic favorability of different filling structures as a function of the nanotube radius, such as the formation of a quasi-one-dimensional rock-salt-like phase inside nanotubes of radii between 5.4 and 7.9 Å . Simulated TEM images of our structures show excellent agreement between our results and experimental TEM imagery. We show that, for some nanotubes, the nanowires undergo temperature-induced phase transitions from one crystalline structure to another due to vibrational contributions to the free energy, which is a first step toward nano-phase-change memory devices.
Image Search Reranking With Hierarchical Topic Awareness.
Tian, Xinmei; Yang, Linjun; Lu, Yijuan; Tian, Qi; Tao, Dacheng
2015-10-01
With much attention from both academia and industrial communities, visual search reranking has recently been proposed to refine image search results obtained from text-based image search engines. Most of the traditional reranking methods cannot capture both relevance and diversity of the search results at the same time. Or they ignore the hierarchical topic structure of search result. Each topic is treated equally and independently. However, in real applications, images returned for certain queries are naturally in hierarchical organization, rather than simple parallel relation. In this paper, a new reranking method "topic-aware reranking (TARerank)" is proposed. TARerank describes the hierarchical topic structure of search results in one model, and seamlessly captures both relevance and diversity of the image search results simultaneously. Through a structured learning framework, relevance and diversity are modeled in TARerank by a set of carefully designed features, and then the model is learned from human-labeled training samples. The learned model is expected to predict reranking results with high relevance and diversity for testing queries. To verify the effectiveness of the proposed method, we collect an image search dataset and conduct comparison experiments on it. The experimental results demonstrate that the proposed TARerank outperforms the existing relevance-based and diversified reranking methods.
Texture metric that predicts target detection performance
NASA Astrophysics Data System (ADS)
Culpepper, Joanne B.
2015-12-01
Two texture metrics based on gray level co-occurrence error (GLCE) are used to predict probability of detection and mean search time. The two texture metrics are local clutter metrics and are based on the statistics of GLCE probability distributions. The degree of correlation between various clutter metrics and the target detection performance of the nine military vehicles in complex natural scenes found in the Search_2 dataset are presented. Comparison is also made between four other common clutter metrics found in the literature: root sum of squares, Doyle, statistical variance, and target structure similarity. The experimental results show that the GLCE energy metric is a better predictor of target detection performance when searching for targets in natural scenes than the other clutter metrics studied.
F-RAG: Generating Atomic Coordinates from RNA Graphs by Fragment Assembly.
Jain, Swati; Schlick, Tamar
2017-11-24
Coarse-grained models represent attractive approaches to analyze and simulate ribonucleic acid (RNA) molecules, for example, for structure prediction and design, as they simplify the RNA structure to reduce the conformational search space. Our structure prediction protocol RAGTOP (RNA-As-Graphs Topology Prediction) represents RNA structures as tree graphs and samples graph topologies to produce candidate graphs. However, for a more detailed study and analysis, construction of atomic from coarse-grained models is required. Here we present our graph-based fragment assembly algorithm (F-RAG) to convert candidate three-dimensional (3D) tree graph models, produced by RAGTOP into atomic structures. We use our related RAG-3D utilities to partition graphs into subgraphs and search for structurally similar atomic fragments in a data set of RNA 3D structures. The fragments are edited and superimposed using common residues, full atomic models are scored using RAGTOP's knowledge-based potential, and geometries of top scoring models is optimized. To evaluate our models, we assess all-atom RMSDs and Interaction Network Fidelity (a measure of residue interactions) with respect to experimentally solved structures and compare our results to other fragment assembly programs. For a set of 50 RNA structures, we obtain atomic models with reasonable geometries and interactions, particularly good for RNAs containing junctions. Additional improvements to our protocol and databases are outlined. These results provide a good foundation for further work on RNA structure prediction and design applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
Xu, Dong; Zhang, Yang
2012-07-01
Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin
2017-08-01
The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.
Pattern Search in Multi-structure Data: A Framework for the Next-Generation Evidence-based Medicine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R; Ainsworth, Keela C
With the advent of personalized and evidence-based medicine, the need for a framework to analyze/interpret quantitative measurements (blood work, toxicology, etc.) with qualitative descriptions (specialist reports after reading images, bio-medical knowledge-bases) to predict diagnostic risks is fast emerging. Addressing this need, we pose and address the following questions (i) How can we jointly analyze both qualitative and quantitative data ? (ii) Is the fusion of multi-structure data expected to provide better insights than either of them individually ? We present experiments on two bio-medical data sets - mammography and traumatic brain studies to demonstrate architectures and tools for evidence-pattern search.
Efficient protein structure search using indexing methods
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.
Kim, Sungchul; Sael, Lee; Yu, Hwanjo
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M
2018-04-01
Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin
2017-03-01
Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.
Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam
2015-06-22
A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.
Xu, Dong; Zhang, Yang
2012-01-01
Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field. PMID:22411565
LigSearch: a knowledge-based web server to identify likely ligands for a protein target
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beer, Tjaart A. P. de; Laskowski, Roman A.; Duban, Mark-Eugene
LigSearch is a web server for identifying ligands likely to bind to a given protein. Identifying which ligands might bind to a protein before crystallization trials could provide a significant saving in time and resources. LigSearch, a web server aimed at predicting ligands that might bind to and stabilize a given protein, has been developed. Using a protein sequence and/or structure, the system searches against a variety of databases, combining available knowledge, and provides a clustered and ranked output of possible ligands. LigSearch can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/LigSearch.
RNA secondary structure prediction using soft computing.
Ray, Shubhra Sankar; Pal, Sankar K
2013-01-01
Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction
Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.
2008-01-01
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945
Won, Jonghun; Lee, Gyu Rie; Park, Hahnbeom; Seok, Chaok
2018-06-07
The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed. Inspired by the observation of similar ECL2 structures of closely related GPCRs, a template-based sampling method employing loop structure templates selected from the structure database was developed. A new metric for evaluating similarity of the target loop to templates was introduced for template selection. An ab initio loop sampling method was also developed to treat cases without highly similar templates. The ab initio method is based on the previously developed fragment assembly and loop closure method. A new sampling component that takes advantage of secondary structure prediction was added. In addition, a conserved disulfide bridge restraining ECL2 conformation was predicted and analytically incorporated into sampling, reducing the effective dimension of the conformational search space. The sampling method was combined with an existing energy function for comparison with previously reported loop structure prediction methods, and the benchmark test demonstrated outstanding performance.
A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.
Li, Haiou; Lyu, Qiang; Cheng, Jianlin
2016-12-01
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei
2012-06-25
Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.
Computational prediction of muon stopping sites using ab initio random structure searching (AIRSS)
NASA Astrophysics Data System (ADS)
Liborio, Leandro; Sturniolo, Simone; Jochym, Dominik
2018-04-01
The stopping site of the muon in a muon-spin relaxation experiment is in general unknown. There are some techniques that can be used to guess the muon stopping site, but they often rely on approximations and are not generally applicable to all cases. In this work, we propose a purely theoretical method to predict muon stopping sites in crystalline materials from first principles. The method is based on a combination of ab initio calculations, random structure searching, and machine learning, and it has successfully predicted the MuT and MuBC stopping sites of muonium in Si, diamond, and Ge, as well as the muonium stopping site in LiF, without any recourse to experimental results. The method makes use of Soprano, a Python library developed to aid ab initio computational crystallography, that was publicly released and contains all the software tools necessary to reproduce our analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frolov, T.; Setyawan, W.; Kurtz, R. J.
We report a computational discovery of novel grain boundary structures and multiple grain boundary phases in elemental bcc tungsten. While grain boundary structures created by the - surface method as a union of two perfect half crystals have been studied extensively, it is known that the method has limitations and does not always predict the correct ground states. Here, we use a newly developed computational tool, based on evolutionary algorithms, to perform a grand-canonical search of high-angle symmetric tilt boundary in tungsten, and we find new ground states and multiple phases that cannot be described using the conventional structural unitmore » model. We use MD simulations to demonstrate that the new structures can coexist at finite temperature in a closed system, confirming these are examples of different GB phases. The new ground state is confirmed by first-principles calculations.Evolutionary grand-canonical search predicts novel grain boundary structures and multiple grain boundary phases in elemental body-centered cubic (bcc) metals represented by tungsten, tantalum and molybdenum.« less
Handl, Julia; Lovell, Simon C.
2016-01-01
ABSTRACT Energy functions, fragment libraries, and search methods constitute three key components of fragment‐assembly methods for protein structure prediction, which are all crucial for their ability to generate high‐accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment‐assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non‐uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low‐energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low‐energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment‐based methods in terms of the quality of conformational exploration realized. Proteins 2016; 84:411–426. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26799916
Automated use of mutagenesis data in structure prediction.
Nanda, Vikas; DeGrado, William F
2005-05-15
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.
A Quantitative Model for the Prediction of Sooting Tendency from Molecular Structure
DOE Office of Scientific and Technical Information (OSTI.GOV)
St. John, Peter C.; Kairys, Paul; Das, Dhrubajyoti D.
Particulate matter emissions negatively affect public health and global climate, yet newer fuel-efficient gasoline direct injection engines tend to produce more soot than their port-fuel injection counterparts. Fortunately, the search for sustainable biomass-based fuel blendstocks provides an opportunity to develop fuels that suppress soot formation in more efficient engine designs. However, as emissions tests are experimentally cumbersome and the search space for potential bioblendstocks is vast, new techniques are needed to estimate the sooting tendency of a diverse range of compounds. In this study, we develop a quantitative structure-activity relationship (QSAR) model of sooting tendency based on the experimental yieldmore » sooting index (YSI), which ranks molecules on a scale from n-hexane, 0, to benzene, 100. The model includes a rigorously defined applicability domain, and the predictive performance is checked using both internal and external validation. Model predictions for compounds in the external test set had a median absolute error of ~3 YSI units. An investigation of compounds that are poorly predicted by the model lends new insight into the complex mechanisms governing soot formation. Predictive models of soot formation can therefore be expected to play an increasingly important role in the screening and development of next-generation biofuels.« less
A Quantitative Model for the Prediction of Sooting Tendency from Molecular Structure
St. John, Peter C.; Kairys, Paul; Das, Dhrubajyoti D.; ...
2017-07-24
Particulate matter emissions negatively affect public health and global climate, yet newer fuel-efficient gasoline direct injection engines tend to produce more soot than their port-fuel injection counterparts. Fortunately, the search for sustainable biomass-based fuel blendstocks provides an opportunity to develop fuels that suppress soot formation in more efficient engine designs. However, as emissions tests are experimentally cumbersome and the search space for potential bioblendstocks is vast, new techniques are needed to estimate the sooting tendency of a diverse range of compounds. In this study, we develop a quantitative structure-activity relationship (QSAR) model of sooting tendency based on the experimental yieldmore » sooting index (YSI), which ranks molecules on a scale from n-hexane, 0, to benzene, 100. The model includes a rigorously defined applicability domain, and the predictive performance is checked using both internal and external validation. Model predictions for compounds in the external test set had a median absolute error of ~3 YSI units. An investigation of compounds that are poorly predicted by the model lends new insight into the complex mechanisms governing soot formation. Predictive models of soot formation can therefore be expected to play an increasingly important role in the screening and development of next-generation biofuels.« less
Firefly Algorithm for Structural Search.
Avendaño-Franco, Guillermo; Romero, Aldo H
2016-07-12
The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers.
Bonding-restricted structure search for novel 2D materials with dispersed C2 dimers.
Zhang, Cunzhi; Zhang, Shunhong; Wang, Qian
2016-07-12
Currently, the available algorithms for unbiased structure searches are primarily atom-based, where atoms are manipulated as the elementary units, and energy is used as the target function without any restrictions on the bonding of atoms. In fact, in many cases such as nanostructure-assembled materials, the structural units are nanoclusters. We report a study of a bonding-restricted structure search method based on the particle swarm optimization (PSO) for finding the stable structures of two-dimensional (2D) materials containing dispersed C2 dimers rather than individual C atoms. The C2 dimer can be considered as a prototype of nanoclusters. Taking Si-C, B-C and Ti-C systems as test cases, our method combined with density functional theory and phonon calculations uncover new ground state geometrical structures for SiC2, Si2C2, BC2, B2C2, TiC2, and Ti2C2 sheets and their low-lying energy allotropes, as well as their electronic structures. Equally important, this method can be applied to other complex systems even containing f elements and other molecular dimers such as S2, N2, B2 and Si2, where the complex orbital orientations require extensive search for finding the optimal orientations to maximize the bonding with the dimers, predicting new 2D materials beyond MXenes (a family of transition metal carbides or nitrides) and dichalcogenide monolayers.
Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki
2007-07-01
We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).
Florence, Alastair J; Johnston, Andrea; Price, Sarah L; Nowell, Harriott; Kennedy, Alan R; Shankland, Norman
2006-09-01
An automated parallel crystallisation search for physical forms of carbamazepine, covering 66 solvents and five crystallisation protocols, identified three anhydrous polymorphs (forms I-III), one hydrate and eight organic solvates, including the single-crystal structures of three previously unreported solvates (N,N-dimethylformamide (1:1); hemi-furfural; hemi-1,4-dioxane). Correlation of physical form outcome with the crystallisation conditions demonstrated that the solvent adopts a relatively nonspecific role in determining which polymorph is obtained, and that the previously reported effect of a polymer template facilitating the formation of form IV could not be reproduced by solvent crystallisation alone. In the accompanying computational search, approximately half of the energetically feasible predicted crystal structures exhibit the C=O...H--N R2(2)(8)dimer motif that is observed in the known polymorphs, with the most stable correctly corresponding to form III. Most of the other energetically feasible structures, including the global minimum, have a C=O...H--N C(4) chain hydrogen bond motif. No such chain structures were observed in this or any other previously published work, suggesting that kinetic, rather than thermodynamic, factors determine which of the energetically feasible crystal structures are observed experimentally, with the kinetics apparently favouring nucleation of crystal structures based on the CBZ-CBZ R2(2)(8) motif. (c) 2006 Wiley-Liss, Inc. and the American Pharmacists Association.
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.
Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying
2013-05-01
Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.
Contingency Table Browser - prediction of early stage protein structure.
Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena
2015-01-01
The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.
SATPdb: a database of structurally annotated therapeutic peptides
Singh, Sandeep; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Bhalla, Sherry; Usmani, Salman Sadullah; Gautam, Ankur; Tuknait, Abhishek; Agrawal, Piyush; Mathur, Deepika; Raghava, Gajendra P.S.
2016-01-01
SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics. PMID:26527728
Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen
2010-09-01
The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.
Structure prediction of nanoclusters; a direct or a pre-screened search on the DFT energy landscape?
Farrow, M R; Chow, Y; Woodley, S M
2014-10-21
The atomic structure of inorganic nanoclusters obtained via a search for low lying minima on energy landscapes, or hypersurfaces, is reported for inorganic binary compounds: zinc oxide (ZnO)n, magnesium oxide (MgO)n, cadmium selenide (CdSe)n, and potassium fluoride (KF)n, where n = 1-12 formula units. The computational cost of each search is dominated by the effort to evaluate each sample point on the energy landscape and the number of required sample points. The effect of changing the balance between these two factors on the success of the search is investigated. The choice of sample points will also affect the number of required data points and therefore the efficiency of the search. Monte Carlo based global optimisation routines (evolutionary and stochastic quenching algorithms) within a new software package, viz. Knowledge Led Master Code (KLMC), are employed to search both directly and after pre-screening on the DFT energy landscape. Pre-screening includes structural relaxation to minimise a cheaper energy function - based on interatomic potentials - and is found to improve significantly the search efficiency, and typically reduces the number of DFT calculations required to locate the local minima by more than an order of magnitude. Although the choice of functional form is important, the approach is robust to small changes to the interatomic potential parameters. The computational cost of initial DFT calculations of each structure is reduced by employing Gaussian smearing to the electronic energy levels. Larger (KF)n nanoclusters are predicted to form cuboid cuts from the rock-salt phase, but also share many structural motifs with (MgO)n for smaller clusters. The transition from 2D rings to 3D (bubble, or fullerene-like) structures occur at a larger cluster size for (ZnO)n and (CdSe)n. Differences between the HOMO and LUMO energies, for all the compounds apart from KF, are in the visible region of the optical spectrum (2-3 eV); KF lies deep in the UV region at 5 eV and shows little variation. Extrapolating the electron affinities found for the clusters with respect to size results in the qualitatively correct work functions for the respective bulk materials.
Knowledge-based fragment binding prediction.
Tang, Grace W; Altman, Russ B
2014-04-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
Knowledge-based Fragment Binding Prediction
Tang, Grace W.; Altman, Russ B.
2014-01-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971
Building a Better Fragment Library for De Novo Protein Structure Prediction
de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.
2015-01-01
Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595
Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng
2016-01-01
To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more efficiently obtain the near-native protein structure.
Computational design of d-peptide inhibitors of hepatitis delta antigen dimerization
NASA Astrophysics Data System (ADS)
Elkin, Carl D.; Zuccola, Harmon J.; Hogle, James M.; Joseph-McCarthy, Diane
2000-11-01
Hepatitis delta virus (HDV) encodes a single polypeptide called hepatitis delta antigen (DAg). Dimerization of DAg is required for viral replication. The structure of the dimerization region, residues 12 to 60, consists of an anti-parallel coiled coil [Zuccola et al., Structure, 6 (1998) 821]. Multiple Copy Simultaneous Searches (MCSS) of the hydrophobic core region formed by the bend in the helix of one monomer of this structure were carried out for many diverse functional groups. Six critical interaction sites were identified. The Protein Data Bank was searched for backbone templates to use in the subsequent design process by matching to these sites. A 14 residue helix expected to bind to the d-isomer of the target structure was selected as the template. Over 200 000 mutant sequences of this peptide were generated based on the MCSS results. A secondary structure prediction algorithm was used to screen all sequences, and in general only those that were predicted to be highly helical were retained. Approximately 100 of these 14-mers were model built as d-peptides and docked with the l-isomer of the target monomer. Based on calculated interaction energies, predicted helicity, and intrahelical salt bridge patterns, a small number of peptides were selected as the most promising candidates. The ligand design approach presented here is the computational analogue of mirror image phage display. The results have been used to characterize the interactions responsible for formation of this model anti-parallel coiled coil and to suggest potential ligands to disrupt it.
A Multiobjective Approach Applied to the Protein Structure Prediction Problem
2002-03-07
like a low energy search landscape . 2.1.1 Symbolic/Formalized Problem Domain Description. Every computer representable problem can also be embodied...method [60]. 3.4 Energy Minimization Methods The energy landscape algorithms are based on the idea that a protein’s final resting conformation is...in our GA used to search the PSP problem energy landscape ). 3.5.1 Simple GA. The main routine in a sGA, after encoding the problem, builds a
RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins.
Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E
2018-05-09
RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction
Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian
2017-01-01
Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.
Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M
2017-05-01
Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
The dual role of fragments in fragment-assembly methods for de novo protein structure prediction
Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.
2013-01-01
In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594
Bonding-restricted structure search for novel 2D materials with dispersed C2 dimers
Zhang, Cunzhi; Zhang, Shunhong; Wang, Qian
2016-01-01
Currently, the available algorithms for unbiased structure searches are primarily atom-based, where atoms are manipulated as the elementary units, and energy is used as the target function without any restrictions on the bonding of atoms. In fact, in many cases such as nanostructure-assembled materials, the structural units are nanoclusters. We report a study of a bonding-restricted structure search method based on the particle swarm optimization (PSO) for finding the stable structures of two-dimensional (2D) materials containing dispersed C2 dimers rather than individual C atoms. The C2 dimer can be considered as a prototype of nanoclusters. Taking Si-C, B-C and Ti-C systems as test cases, our method combined with density functional theory and phonon calculations uncover new ground state geometrical structures for SiC2, Si2C2, BC2, B2C2, TiC2, and Ti2C2 sheets and their low-lying energy allotropes, as well as their electronic structures. Equally important, this method can be applied to other complex systems even containing f elements and other molecular dimers such as S2, N2, B2 and Si2, where the complex orbital orientations require extensive search for finding the optimal orientations to maximize the bonding with the dimers, predicting new 2D materials beyond MXenes (a family of transition metal carbides or nitrides) and dichalcogenide monolayers. PMID:27403589
Fast structure similarity searches among protein models: efficient clustering of protein fragments
2012-01-01
Background For many predictive applications a large number of models is generated and later clustered in subsets based on structure similarity. In most clustering algorithms an all-vs-all root mean square deviation (RMSD) comparison is performed. Most of the time is typically spent on comparison of non-similar structures. For sets with more than, say, 10,000 models this procedure is very time-consuming and alternative faster algorithms, restricting comparisons only to most similar structures would be useful. Results We exploit the inverse triangle inequality on the RMSD between two structures given the RMSDs with a third structure. The lower bound on RMSD may be used, when restricting the search of similarity to a reasonably low RMSD threshold value, to speed up similarity searches significantly. Tests are performed on large sets of decoys which are widely used as test cases for predictive methods, with a speed-up of up to 100 times with respect to all-vs-all comparison depending on the set and parameters used. Sample applications are shown. Conclusions The algorithm presented here allows fast comparison of large data sets of structures with limited memory requirements. As an example of application we present clustering of more than 100000 fragments of length 5 from the top500H dataset into few hundred representative fragments. A more realistic scenario is provided by the search of similarity within the very large decoy sets used for the tests. Other applications regard filtering nearly-indentical conformation in selected CASP9 datasets and clustering molecular dynamics snapshots. Availability A linux executable and a Perl script with examples are given in the supplementary material (Additional file 1). The source code is available upon request from the authors. PMID:22642815
A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.
Edvardsson, Sverker; Gardner, Paul P; Poole, Anthony M; Hendy, Michael D; Penny, David; Moulton, Vincent
2003-05-01
Noncoding RNA genes produce functional RNA molecules rather than coding for proteins. One such family is the H/ACA snoRNAs. Unlike the related C/D snoRNAs these have resisted automated detection to date. We develop an algorithm to screen the yeast genome for novel H/ACA snoRNAs. To achieve this, we introduce some new methods for facilitating the search for noncoding RNAs in genomic sequences which are based on properties of predicted minimum free-energy (MFE) secondary structures. The algorithm has been implemented and can be generalized to enable screening of other eukaryote genomes. We find that use of primary sequence alone is insufficient for identifying novel H/ACA snoRNAs. Only the use of secondary structure filters reduces the number of candidates to a manageable size. From genomic context, we identify three strong H/ACA snoRNA candidates. These together with a further 47 candidates obtained by our analysis are being experimentally screened.
Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi
2007-02-15
Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Drug search for leishmaniasis: a virtual screening approach by grid computing
NASA Astrophysics Data System (ADS)
Ochoa, Rodrigo; Watowich, Stanley J.; Flórez, Andrés; Mesa, Carol V.; Robledo, Sara M.; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in 100 countries, with infections causing 2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen 2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Drug search for leishmaniasis: a virtual screening approach by grid computing.
Ochoa, Rodrigo; Watowich, Stanley J; Flórez, Andrés; Mesa, Carol V; Robledo, Sara M; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in ~100 countries, with infections causing ~2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen ~2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Perceived breast cancer risk: heuristic reasoning and search for a dominance structure.
Katapodi, Maria C; Facione, Noreen C; Humphreys, Janice C; Dodd, Marylin J
2005-01-01
Studies suggest that people construct their risk perceptions by using inferential rules called heuristics. The purpose of this study was to identify heuristics that influence perceived breast cancer risk. We examined 11 interviews from women of diverse ethnic/cultural backgrounds who were recruited from community settings. Narratives in which women elaborated about their own breast cancer risk were analyzed with Argument and Heuristic Reasoning Analysis methodology, which is based on applied logic. The availability, simulation, representativeness, affect, and perceived control heuristics, and search for a dominance structure were commonly used for making risk assessments. Risk assessments were based on experiences with an abnormal breast symptom, experiences with affected family members and friends, beliefs about living a healthy lifestyle, and trust in health providers. Assessment of the potential threat of a breast symptom was facilitated by the search for a dominance structure. Experiences with family members and friends were incorporated into risk assessments through the availability, simulation, representativeness, and affect heuristics. Mistrust in health providers led to an inappropriate dependence on the perceived control heuristic. Identified heuristics appear to create predictable biases and suggest that perceived breast cancer risk is based on common cognitive patterns.
Ternary alloy material prediction using genetic algorithm and cluster expansion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chong
2015-12-01
This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we didmore » our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe 3VSi 2 is a new stable phase and it can be very inspiring to the future experiments.« less
Conceptual Web Users' Actions Prediction for Ontology-Based Browsing Recommendations
NASA Astrophysics Data System (ADS)
Robal, Tarmo; Kalja, Ahto
The Internet consists of thousands of web sites with different kinds of structures. However, users are browsing the web according to their informational expectations towards the web site searched, having an implicit conceptual model of the domain in their minds. Nevertheless, people tend to repeat themselves and have partially shared conceptual views while surfing the web, finding some areas of web sites more interesting than others. Herein, we take advantage of the latter and provide a model and a study on predicting users' actions based on the web ontology concepts and their relations.
Simkovic, Felix; Thomas, Jens M H; Keegan, Ronan M; Winn, Martyn D; Mayans, Olga; Rigden, Daniel J
2016-07-01
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
Simkovic, Felix; Thomas, Jens M. H.; Keegan, Ronan M.; Winn, Martyn D.; Mayans, Olga; Rigden, Daniel J.
2016-01-01
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions (‘decoys’), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing. PMID:27437113
NASA Astrophysics Data System (ADS)
Rossi, Mariana; Gasparotto, Piero; Ceriotti, Michele
2016-09-01
Molecular crystals often exist in multiple competing polymorphs, showing significantly different physicochemical properties. Computational crystal structure prediction is key to interpret and guide the search for the most stable or useful form, a real challenge due to the combinatorial search space, and the complex interplay of subtle effects that work together to determine the relative stability of different structures. Here we take a comprehensive approach based on different flavors of thermodynamic integration in order to estimate all contributions to the free energies of these systems with density-functional theory, including the oft-neglected anharmonic contributions and nuclear quantum effects. We take the two main stable forms of paracetamol as a paradigmatic example. We find that anharmonic contributions, different descriptions of van der Waals interactions, and nuclear quantum effects all matter to quantitatively determine the stability of different phases. Our analysis highlights the many challenges inherent in the development of a quantitative and predictive framework to model molecular crystals. However, it also indicates which of the components of the free energy can benefit from a cancellation of errors that can redeem the predictive power of approximate models, and suggests simple steps that could be taken to improve the reliability of ab initio crystal structure prediction.
The multi-copy simultaneous search methodology: a fundamental tool for structure-based drug design.
Schubert, Christian R; Stultz, Collin M
2009-08-01
Fragment-based ligand design approaches, such as the multi-copy simultaneous search (MCSS) methodology, have proven to be useful tools in the search for novel therapeutic compounds that bind pre-specified targets of known structure. MCSS offers a variety of advantages over more traditional high-throughput screening methods, and has been applied successfully to challenging targets. The methodology is quite general and can be used to construct functionality maps for proteins, DNA, and RNA. In this review, we describe the main aspects of the MCSS method and outline the general use of the methodology as a fundamental tool to guide the design of de novo lead compounds. We focus our discussion on the evaluation of MCSS results and the incorporation of protein flexibility into the methodology. In addition, we demonstrate on several specific examples how the information arising from the MCSS functionality maps has been successfully used to predict ligand binding to protein targets and RNA.
Hu, Bingjie; Zhu, Xiaolei; Monroe, Lyman; Bures, Mark G; Kihara, Daisuke
2014-08-27
Structure-based computational methods have been widely used in exploring protein-ligand interactions, including predicting the binding ligands of a given protein based on their structural complementarity. Compared to other protein and ligand representations, the advantages of a surface representation include reduced sensitivity to subtle changes in the pocket and ligand conformation and fast search speed. Here we developed a novel method named PL-PatchSurfer (Protein-Ligand PatchSurfer). PL-PatchSurfer represents the protein binding pocket and the ligand molecular surface as a combination of segmented surface patches. Each patch is characterized by its geometrical shape and the electrostatic potential, which are represented using the 3D Zernike descriptor (3DZD). We first tested PL-PatchSurfer on binding ligand prediction and found it outperformed the pocket-similarity based ligand prediction program. We then optimized the search algorithm of PL-PatchSurfer using the PDBbind dataset. Finally, we explored the utility of applying PL-PatchSurfer to a larger and more diverse dataset and showed that PL-PatchSurfer was able to provide a high early enrichment for most of the targets. To the best of our knowledge, PL-PatchSurfer is the first surface patch-based method that treats ligand complementarity at protein binding sites. We believe that using a surface patch approach to better understand protein-ligand interactions has the potential to significantly enhance the design of new ligands for a wide array of drug-targets.
Hu, Bingjie; Zhu, Xiaolei; Monroe, Lyman; Bures, Mark G.; Kihara, Daisuke
2014-01-01
Structure-based computational methods have been widely used in exploring protein-ligand interactions, including predicting the binding ligands of a given protein based on their structural complementarity. Compared to other protein and ligand representations, the advantages of a surface representation include reduced sensitivity to subtle changes in the pocket and ligand conformation and fast search speed. Here we developed a novel method named PL-PatchSurfer (Protein-Ligand PatchSurfer). PL-PatchSurfer represents the protein binding pocket and the ligand molecular surface as a combination of segmented surface patches. Each patch is characterized by its geometrical shape and the electrostatic potential, which are represented using the 3D Zernike descriptor (3DZD). We first tested PL-PatchSurfer on binding ligand prediction and found it outperformed the pocket-similarity based ligand prediction program. We then optimized the search algorithm of PL-PatchSurfer using the PDBbind dataset. Finally, we explored the utility of applying PL-PatchSurfer to a larger and more diverse dataset and showed that PL-PatchSurfer was able to provide a high early enrichment for most of the targets. To the best of our knowledge, PL-PatchSurfer is the first surface patch-based method that treats ligand complementarity at protein binding sites. We believe that using a surface patch approach to better understand protein-ligand interactions has the potential to significantly enhance the design of new ligands for a wide array of drug-targets. PMID:25167137
Schueler-Furman, Ora; Wang, Chu; Baker, David
2005-08-01
RosettaDock uses real-space Monte Carlo minimization (MCM) on both rigid-body and side-chain degrees of freedom to identify the lowest free energy docked arrangement of 2 protein structures. An improved version of the method that uses gradient-based minimization for off-rotamer side-chain optimization and includes information from unbound structures was used to create predictions for Rounds 4 and 5 of CAPRI. First, large numbers of independent MCM trajectories were carried out and the lowest free energy docked configurations identified. Second, new trajectories were started from these lowest energy structures to thoroughly sample the surrounding conformation space, and the lowest energy configurations were submitted as predictions. For all cases in which there were no significant backbone conformational changes, a small number of very low-energy configurations were identified in the first, global search and subsequently found to be close to the center of the basin of attraction in the free energy landscape in the second, local search. Following the release of the experimental coordinates, it was found that the centers of these free energy minima were remarkably close to the native structures in not only the rigid-body orientation but also the detailed conformations of the side-chains. Out of 8 targets, the lowest energy models had interface root-mean-square deviations (RMSDs) less than 1.1 A from the correct structures for 6 targets, and interface RMSDs less than 0.4 A for 3 targets. The predictions were top submissions to CAPRI for Targets 11, 12, 14, 15, and 19. The close correspondence of the lowest free energy structures found in our searches to the experimental structures suggests that our free energy function is a reasonable representation of the physical chemistry, and that the real space search with full side-chain flexibility to some extent solves the protein-protein docking problem in the absence of significant backbone conformational changes. On the other hand, the approach fails when there are significant backbone conformational changes as the steric complementarity of the 2 proteins cannot be modeled without incorporating backbone flexibility, and this is the major goal of our current work.
Nasr, Ramzi; Vernica, Rares; Li, Chen; Baldi, Pierre
2012-01-01
In ligand-based screening, retrosynthesis, and other chemoinformatics applications, one of-ten seeks to search large databases of molecules in order to retrieve molecules that are similar to a given query. With the expanding size of molecular databases, the efficiency and scalability of data structures and algorithms for chemical searches are becoming increasingly important. Remarkably, both the chemoinformatics and information retrieval communities have converged on similar solutions whereby molecules or documents are represented by binary vectors, or fingerprints, indexing their substructures such as labeled paths for molecules and n-grams for text, with the same Jaccard-Tanimoto similarity measure. As a result, similarity search methods from one field can be adapted to the other. Here we adapt recent, state-of-the-art, inverted index methods from information retrieval to speed up similarity searches in chemoinformatics. Our results show a several-fold speed-up improvement over previous methods for both thresh-old searches and top-K searches. We also provide a mathematical analysis that allows one to predict the level of pruning achieved by the inverted index approach, and validate the quality of these predictions through simulation experiments. All results can be replicated using data freely downloadable from http://cdb.ics.uci.edu/. PMID:22462644
NASA Astrophysics Data System (ADS)
Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen
Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.
Critical Features of Fragment Libraries for Protein Structure Prediction
dos Santos, Karina Baptista
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928
Critical Features of Fragment Libraries for Protein Structure Prediction.
Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Predicting the performance of fingerprint similarity searching.
Vogt, Martin; Bajorath, Jürgen
2011-01-01
Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.
Balachandran, Prasanna V; Kowalski, Benjamin; Sehirlioglu, Alp; Lookman, Turab
2018-04-26
Experimental search for high-temperature ferroelectric perovskites is a challenging task due to the vast chemical space and lack of predictive guidelines. Here, we demonstrate a two-step machine learning approach to guide experiments in search of xBi[Formula: see text]O 3 -(1 - x)PbTiO 3 -based perovskites with high ferroelectric Curie temperature. These involve classification learning to screen for compositions in the perovskite structures, and regression coupled to active learning to identify promising perovskites for synthesis and feedback. The problem is challenging because the search space is vast, spanning ~61,500 compositions and only 167 are experimentally studied. Furthermore, not every composition can be synthesized in the perovskite phase. In this work, we predict x, y, Me', and Me″ such that the resulting compositions have both high Curie temperature and form in the perovskite structure. Outcomes from both successful and failed experiments then iteratively refine the machine learning models via an active learning loop. Our approach finds six perovskites out of ten compositions synthesized, including three previously unexplored {Me'Me″} pairs, with 0.2Bi(Fe 0.12 Co 0.88 )O 3 -0.8PbTiO 3 showing the highest measured Curie temperature of 898 K among them.
Precision searches in dijets at the HL-LHC and HE-LHC
NASA Astrophysics Data System (ADS)
Chekanov, S. V.; Childers, J. T.; Proudfoot, J.; Wang, R.; Frizzell, D.
2018-05-01
This paper explores the physics reach of the High-Luminosity Large Hadron Collider (HL-LHC) for searches of new particles decaying to two jets. We discuss inclusive searches in dijets and b-jets, as well as searches in semi-inclusive events by requiring an additional lepton that increases sensitivity to different aspects of the underlying processes. We discuss the expected exclusion limits for generic models predicting new massive particles that result in resonant structures in the dijet mass. Prospects of the Higher-Energy LHC (HE-LHC) collider are also discussed. The study is based on the Pythia8 Monte Carlo generator using representative event statistics for the HL-LHC and HE-LHC running conditions. The event samples were created using supercomputers at NERSC.
Soft Computing Methods for Disulfide Connectivity Prediction.
Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S
2015-01-01
The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.
Enhanced sensitivity of CpG island search and primer design based on predicted CpG island position.
Park, Hyun-Chul; Ahn, Eu-Ree; Jung, Ju Yeon; Park, Ji-Hye; Lee, Jee Won; Lim, Si-Keun; Kim, Won
2018-05-01
DNA methylation has important biological roles, such as gene expression regulation, as well as practical applications in forensics, such as in body fluid identification and age estimation. DNA methylation often occurs in the CpG site, and methylation within the CpG islands affects various cellular functions and is related to tissue-specific identification. Several programs have been developed to identify CpG islands; however, the size, location, and number of predicted CpG islands are not identical due to different search algorithms. In addition, they only provide structural information for predicted CpG islands without experimental information, such as primer design. We developed an analysis pipeline package, CpGPNP, to integrate CpG island prediction and primer design. CpGPNP predicts CpG islands more accurately and sensitively than other programs, and designs primers easily based on the predicted CpG island locations. The primer design function included standard, bisulfite, and methylation-specific PCR to identify the methylation of particular CpG sites. In this study, we performed CpG island prediction on all chromosomes and compared CpG island search performance of CpGPNP with other CpG island prediction programs. In addition, we compared the position of primers designed for a specific region within the predicted CpG island using other bisulfite PCR primer programs. The primers designed by CpGPNP were used to experimentally verify the amplification of the target region of markers for body fluid identification and age estimation. CpGPNP is freely available at http://forensicdna.kr/cpgpnp/. Copyright © 2018 Elsevier B.V. All rights reserved.
The ``Missing Compounds'' affair in functionality-driven material discovery
NASA Astrophysics Data System (ADS)
Zunger, Alex
2014-03-01
In the paradigm of ``data-driven discovery,'' underlying one of the leading streams of the Material Genome Initiative (MGI), one attempts to compute high-throughput style as many of the properties of as many of the N (about 10**5- 10**6) compounds listed in databases of previously known compounds. One then inspects the ensuing Big Data, searching for useful trends. The alternative and complimentary paradigm of ``functionality-directed search and optimization'' used here, searches instead for the n much smaller than N configurations and compositions that have the desired value of the target functionality. Examples include the use of genetic and other search methods that optimize the structure or identity of atoms on lattice sites, using atomistic electronic structure (such as first-principles) approaches in search of a given electronic property. This addresses a few of the bottlenecks that have faced the alternative, data-driven/high throughput/Big Data philosophy: (i) When the configuration space is theoretically of infinite size, building a complete data base as in data-driven discovery is impossible, yet searching for the optimum functionality, is still a well-posed problem. (ii) The configuration space that we explore might include artificially grown, kinetically stabilized systems (such as 2D layer stacks; superlattices; colloidal nanostructures; Fullerenes) that are not listed in compound databases (used by data-driven approaches), (iii) a large fraction of chemically plausible compounds have not been experimentally synthesized, so in the data-driven approach these are often skipped. In our approach we search explicitly for such ``Missing Compounds''. It is likely that many interesting material properties will be found in cases (i)-(iii) that elude high throughput searches based on databases encapsulating existing knowledge. I will illustrate (a) Functionality-driven discovery of topological insulators and valley-split quantum-computer semiconductors, as well as (b) Use of ``first principles thermodynamics'' to discern which of the previously ``missing compounds'' should, in fact exist and in which structure. Synthesis efforts by Poeppelmeier group at NU realized 20 never-before-made half-Heusler compounds out of the 20 predicted ones, in our predicted space groups. This type of theory-led experimental search of designed materials with target functionalities may shorten the current process of discovery of interesting functional materials. Supported by DOE ,Office of Science, Energy Frontier Research Center for Inverse Design
Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka
2018-05-08
Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Application of kernel functions for accurate similarity search in large chemical databases.
Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H
2010-04-29
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
Dynamic search and working memory in social recall.
Hills, Thomas T; Pachur, Thorsten
2012-01-01
What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to recall individuals from their personal social networks and measured each participant's working memory capacity. Additionally, participants provided social-category and contact-frequency information about the recalled individuals as well as information about the social proximity among the recalled individuals. On the basis of these data, we tested various computational models of memory search regarding their ability to account for the patterns in which participants recalled from social memory. Although recall patterns showed clustering based on social categories, models assuming dynamic transitions between representations cued by social proximity and frequency information predicted participants' recall patterns best-no additional explanatory power was gained from social-category information. Moreover, individual differences in the time between transitions were positively correlated with differences in working memory capacity. These results highlight the role of social proximity in structuring social memory and elucidate the role of working memory for maintaining search criteria during search within that structure.
Automatic prediction of protein domains from sequence information using a hybrid learning system.
Nagarajan, Niranjan; Yona, Golan
2004-06-12
We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/
Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method
Park, In-Hee; Gangupomu, Vamshi; Wagner, Jeffrey; Jain, Abhinandan; Vaidehi, Nagara-jan
2012-01-01
The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement. PMID:22260550
Sun, Rongrong; Wang, Yuanyuan
2008-11-01
Predicting the spontaneous termination of the atrial fibrillation (AF) leads to not only better understanding of mechanisms of the arrhythmia but also the improved treatment of the sustained AF. A novel method is proposed to characterize the AF based on structure and the quantification of the recurrence plot (RP) to predict the termination of the AF. The RP of the electrocardiogram (ECG) signal is firstly obtained and eleven features are extracted to characterize its three basic patterns. Then the sequential forward search (SFS) algorithm and Davies-Bouldin criterion are utilized to select the feature subset which can predict the AF termination effectively. Finally, the multilayer perceptron (MLP) neural network is applied to predict the AF termination. An AF database which includes one training set and two testing sets (A and B) of Holter ECG recordings is studied. Experiment results show that 97% of testing set A and 95% of testing set B are correctly classified. It demonstrates that this algorithm has the ability to predict the spontaneous termination of the AF effectively.
Smith, R F; Wiese, B A; Wojzynski, M K; Davison, D B; Worley, K C
1996-05-01
The BCM Search Launcher is an integrated set of World Wide Web (WWW) pages that organize molecular biology-related search and analysis services available on the WWW by function, and provide a single point of entry for related searches. The Protein Sequence Search Page, for example, provides a single sequence entry form for submitting sequences to WWW servers that offer remote access to a variety of different protein sequence search tools, including BLAST, FASTA, Smith-Waterman, BEAUTY, PROSITE, and BLOCKS searches. Other Launch pages provide access to (1) nucleic acid sequence searches, (2) multiple and pair-wise sequence alignments, (3) gene feature searches, (4) protein secondary structure prediction, and (5) miscellaneous sequence utilities (e.g., six-frame translation). The BCM Search Launcher also provides a mechanism to extend the utility of other WWW services by adding supplementary hypertext links to results returned by remote servers. For example, links to the NCBI's Entrez data base and to the Sequence Retrieval System (SRS) are added to search results returned by the NCBI's WWW BLAST server. These links provide easy access to auxiliary information, such as Medline abstracts, that can be extremely helpful when analyzing BLAST data base hits. For new or infrequent users of sequence data base search tools, we have preset the default search parameters to provide the most informative first-pass sequence analysis possible. We have also developed a batch client interface for Unix and Macintosh computers that allows multiple input sequences to be searched automatically as a background task, with the results returned as individual HTML documents directly to the user's system. The BCM Search Launcher and batch client are available on the WWW at URL http:@gc.bcm.tmc.edu:8088/search-launcher.html.
NASA Astrophysics Data System (ADS)
Majzoub, E. H.; Ozoliņš, V.
2008-03-01
We have developed a procedure for crystal structure generation and prediction for ionic compounds consisting of a collection of cations and rigid complex anions. Our approach is based on global optimization of an energy functional consisting of the electrostatic and soft-sphere repulsive energies using Metropolis Monte Carlo (MMC) simulated annealing in conjunction with smoothing of the potential energy landscape via the distance scaling method. The resulting structures, or prototype electrostatic ground states (PEGS), are subsequently relaxed using first-principles density-functional theory (DFT) calculations to obtain accurate structural parameters and thermodynamic properties. This method is shown to produce the ground state structures of NaAlH4 and Mg(AlH4)2 , as well as the mixed cation alanate K2LiAlH6 . For LiAlH4 , the PEGS search produces a structure with a static DFT total energy equal to that of the experimentally observed structure; the latter is stabilized by vibrational contributions to the free energy. For mixed-valence hexa-alanates, XY AlH6 , where X=(Li,Na,K) , and Y=(Mg,Ca) , the PEGS method predicts six unsuspected structure types, which are not found in the existing structure databases. The PEGS search yields energies that are, on the average, better than the best database structures with the same number of atoms per unit cell, demonstrating the predictive power and usefulness of the PEGS structures. In addition to the recently synthesized LiMgAlH6 compound, we predict that LiCaAlH6 , NaCaAlH6 , and KCaAlH6 are also thermodynamically stable with respect to phase separation into other alanates and metal hydrides. In contrast, NaMgAlH6 and KMgAlH6 are slightly unstable (by less than 3kJ/mol ) relative to the phase separation into NaAlH4 , KAlH4 , and MgH2 . We suggest that solid-state ion-exchange reactions between X3AlH6 (X=Li,Na,K) and YCl2 (Y=Mg,Ca) could be used to synthesize the predicted mixed-valence hexa-alanates.
2013-01-01
Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions. PMID:24565020
Discriminative structural approaches for enzyme active-site prediction.
Kato, Tsuyoshi; Nagano, Nozomi
2011-02-15
Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.
Protein homology model refinement by large-scale energy optimization.
Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E; DiMaio, Frank; Baker, David
2018-03-20
Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots.
Hajdin, Christine E; Bellaousov, Stanislav; Huggins, Wayne; Leonard, Christopher W; Mathews, David H; Weeks, Kevin M
2013-04-02
A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in well-folded RNAs were identified.
LMSD: LIPID MAPS structure database
Sud, Manish; Fahy, Eoin; Cotter, Dawn; Brown, Alex; Dennis, Edward A.; Glass, Christopher K.; Merrill, Alfred H.; Murphy, Robert C.; Raetz, Christian R. H.; Russell, David W.; Subramaniam, Shankar
2007-01-01
The LIPID MAPS Structure Database (LMSD) is a relational database encompassing structures and annotations of biologically relevant lipids. Structures of lipids in the database come from four sources: (i) LIPID MAPS Consortium's core laboratories and partners; (ii) lipids identified by LIPID MAPS experiments; (iii) computationally generated structures for appropriate lipid classes; (iv) biologically relevant lipids manually curated from LIPID BANK, LIPIDAT and other public sources. All the lipid structures in LMSD are drawn in a consistent fashion. In addition to a classification-based retrieval of lipids, users can search LMSD using either text-based or structure-based search options. The text-based search implementation supports data retrieval by any combination of these data fields: LIPID MAPS ID, systematic or common name, mass, formula, category, main class, and subclass data fields. The structure-based search, in conjunction with optional data fields, provides the capability to perform a substructure search or exact match for the structure drawn by the user. Search results, in addition to structure and annotations, also include relevant links to external databases. The LMSD is publicly available at PMID:17098933
Rapid search for tertiary fragments reveals protein sequence–structure relationships
Zhou, Jianfu; Grigoryan, Gevorg
2015-01-01
Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. PMID:25420575
JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures
Dong, Min; Graham, Mitchell; Yadav, Nehul
2017-01-01
Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416
Fu, Yong-Bi; Yang, Mo-Hua; Zeng, Fangqin; Biligetu, Bill
2017-01-01
Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding. PMID:28729875
Prediction of novel stable Fe-V-Si ternary phase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nguyen, Manh Cuong; Chen, Chong; Zhao, Xin
Genetic algorithm searches based on a cluster expansion model are performed to search for stable phases of Fe-V-Si ternary. Here, we identify a new thermodynamically, dynamically and mechanically stable ternary phase of Fe 5V 2Si with 2 formula units in a tetragonal unit cell. The formation energy of this new ternary phase is -36.9 meV/atom below the current ternary convex hull. The magnetic moment of Fe in the new structure varies from -0.30-2.52 μ B depending strongly on the number of Fe nearest neighbors. The total magnetic moment is 10.44 μ B/unit cell for new Fe 5V 2Si structure andmore » the system is ordinarily metallic.« less
Prediction of novel stable Fe-V-Si ternary phase
Nguyen, Manh Cuong; Chen, Chong; Zhao, Xin; ...
2018-10-28
Genetic algorithm searches based on a cluster expansion model are performed to search for stable phases of Fe-V-Si ternary. Here, we identify a new thermodynamically, dynamically and mechanically stable ternary phase of Fe 5V 2Si with 2 formula units in a tetragonal unit cell. The formation energy of this new ternary phase is -36.9 meV/atom below the current ternary convex hull. The magnetic moment of Fe in the new structure varies from -0.30-2.52 μ B depending strongly on the number of Fe nearest neighbors. The total magnetic moment is 10.44 μ B/unit cell for new Fe 5V 2Si structure andmore » the system is ordinarily metallic.« less
Improved packing of protein side chains with parallel ant colonies.
Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie
2014-01-01
The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
NASA Astrophysics Data System (ADS)
Zhang, Wenshuai; Zeng, Xiaoyan; Zhang, Li; Peng, Haiyan; Jiao, Yongjun; Zeng, Jun; Treutlein, Herbert R.
2013-06-01
In this work, we have developed a new approach to predict the epitopes of antigens that are recognized by a specific antibody. Our method is based on the "multiple copy simultaneous search" (MCSS) approach which identifies optimal locations of small chemical functional groups on the surfaces of the antibody, and identifying sequence patterns of peptides that can bind to the surface of the antibody. The identified sequence patterns are then used to search the amino-acid sequence of the antigen protein. The approach was validated by reproducing the binding epitope of HIV gp120 envelop glycoprotein for the human neutralizing antibody as revealed in the available crystal structure. Our method was then applied to predict the epitopes of two glycoproteins of a newly discovered bunyavirus recognized by an antibody named MAb 4-5. These predicted epitopes can be verified by experimental methods. We also discuss the involvement of different amino acids in the antigen-antibody recognition based on the distributions of MCSS minima of different functional groups.
Computer predictions on Rh-based double perovskites with unusual electronic and magnetic properties
NASA Astrophysics Data System (ADS)
Halder, Anita; Nafday, Dhani; Sanyal, Prabuddha; Saha-Dasgupta, Tanusri
2018-03-01
In search for new magnetic materials, we make computer prediction of structural, electronic and magnetic properties of yet-to-be synthesized Rh-based double perovskite compounds, Sr(Ca)2BRhO6 (B=Cr, Mn, Fe). We use combination of evolutionary algorithm, density functional theory, and statistical-mechanical tool for this purpose. We find that the unusual valence of Rh5+ may be stabilized in these compounds through formation of oxygen ligand hole. Interestingly, while the Cr-Rh and Mn-Rh compounds are predicted to be ferromagnetic half-metals, the Fe-Rh compounds are found to be rare examples of antiferromagnetic and metallic transition-metal oxide with three-dimensional electronic structure. The computed magnetic transition temperatures of the predicted compounds, obtained from finite temperature Monte Carlo study of the first principles-derived model Hamiltonian, are found to be reasonably high. The prediction of favorable growth condition of the compounds, reported in our study, obtained through extensive thermodynamic analysis should be useful for future synthesize of this interesting class of materials with intriguing properties.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Senter, Evan; Sheikh, Saad; Dotu, Ivan; Ponty, Yann; Clote, Peter
2012-01-01
Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/. PMID:23284639
Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou
2012-01-01
Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.
Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou
2012-01-01
Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708
A python-based docking program utilizing a receptor bound ligand shape: PythDock.
Chung, Jae Yoon; Cho, Seung Joo; Hah, Jung-Mi
2011-09-01
PythDock is a heuristic docking program that uses Python programming language with a simple scoring function and a population based search engine. The scoring function considers electrostatic and dispersion/repulsion terms. The search engine utilizes a particle swarm optimization algorithm. A grid potential map is generated using the shape information of a bound ligand within the active site. Therefore, the searching area is more relevant to the ligand binding. To evaluate the docking performance of PythDock, two well-known docking programs (AutoDock and DOCK) were also used with the same data. The accuracy of docked results were measured by the difference of the ligand structure between x-ray structure, and docked pose, i.e., average root mean squared deviation values of the bound ligand were compared for fourteen protein-ligand complexes. Since the number of ligands' rotational flexibility is an important factor affecting the accuracy of a docking, the data set was chosen to have various degrees of flexibility. Although PythDock has a scoring function simpler than those of other programs (AutoDock and DOCK), our results showed that PythDock predicted more accurate poses than both AutoDock4.2 and DOCK6.2. This indicates that PythDock could be a useful tool to study ligand-receptor interactions and could also be beneficial in structure based drug design.
Theoretical predictions of a bucky-diamond SiC cluster.
Yu, Ming; Jayanthi, C S; Wu, S Y
2012-06-15
A study of structural relaxations of Si(n)C(m) clusters corresponding to different compositions, different relative arrangements of Si/C atoms, and different types of initial structure, reveals that the Si(n)C(m) bucky-diamond structure can be obtained for an initial network structure constructed from a truncated bulk 3C-SiC for a magic composition corresponding to n = 68 and m = 79. This study was performed using a semi-empirical Hamiltonian (SCED-LCAO) since it allowed an extensive search of different types of initial structures. However, the bucky-diamond structure predicted by this method was also confirmed by a more accurate density functional theory (DFT) based method. The bucky-diamond structure exhibited by a SiC-based system represents an interesting paradigm where a Si atom can form three-coordinated as well as four-coordinated networks with carbon atoms and vice versa and with both types of network co-existing in the same structure. Specifically, the bucky-diamond structure of the Si(68)C(79) cluster consists of a 35-atom diamond-like inner core (four-atom coordinations) suspended inside a 112-atom fullerene-like shell (three-atom coordinations).
Tian, Sheng; Sun, Huiyong; Pan, Peichen; Li, Dan; Zhen, Xuechu; Li, Youyong; Hou, Tingjun
2014-10-27
In this study, to accommodate receptor flexibility, based on multiple receptor conformations, a novel ensemble docking protocol was developed by using the naïve Bayesian classification technique, and it was evaluated in terms of the prediction accuracy of docking-based virtual screening (VS) of three important targets in the kinase family: ALK, CDK2, and VEGFR2. First, for each target, the representative crystal structures were selected by structural clustering, and the capability of molecular docking based on each representative structure to discriminate inhibitors from non-inhibitors was examined. Then, for each target, 50 ns molecular dynamics (MD) simulations were carried out to generate an ensemble of the conformations, and multiple representative structures/snapshots were extracted from each MD trajectory by structural clustering. On average, the representative crystal structures outperform the representative structures extracted from MD simulations in terms of the capabilities to separate inhibitors from non-inhibitors. Finally, by using the naïve Bayesian classification technique, an integrated VS strategy was developed to combine the prediction results of molecular docking based on different representative conformations chosen from crystal structures and MD trajectories. It was encouraging to observe that the integrated VS strategy yields better performance than the docking-based VS based on any single rigid conformation. This novel protocol may provide an improvement over existing strategies to search for more diverse and promising active compounds for a target of interest.
Ul-Haq, Zaheer; Effendi, Juweria Shahrukh; Ashraf, Sajda; Bkhaitan, Majdi M
2017-06-01
In the current study, quantitative three-dimensional structure-activity-relationship (3D-QSAR) method was performed to design a model for new chemical entities by utilizing pyrazolopyrimidines. Their inhibiting activity on receptor IL-2 Itk correlates descriptors based on topology and hydrophobicity. The best model developed by ligand-based (atom-based) approach has correlation-coefficient of r 2 : 0.987 and cross-validated squared correlation-coefficient of q 2 : 0.541 with an external prediction capability of r 2 : 0.944. Whereas the best selected model developed by structured-based (receptor-based) approach has correlation-coefficient of r 2 : 0.987, cross-validated squared correlation-coefficient of q 2 : 0.637 with an external predictive ability of r 2 : 0.941. The statistical parameters prove that structure-based gave a better model to design new chemical scaffolds. The results achieved indicated that hydrophobicity at R 1 location play a vital role in the inhibitory activity and introduction of appropriately bulky and strongly hydrophobic-groups at position 3 of the terminal phenyl-group which is highly significant to enhance the activity. Six new pyrazolopyrimidine derivatives were designed. Docking simulation study was carried out and their inhibitory activity was predicted by the best structure based model with predictive activity of ranging from 8.43 to 8.85 log unit. The interacting residues PHE435, ASP500, LYS391, GLU436, MET438, CYS442, ILE369, VAL377 of PDB 4HCT were studied with respect to type of bonding with the new compounds. This study was aimed to search out more potent inhibitors of IL-2 Itk. Copyright © 2017 Elsevier Inc. All rights reserved.
Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine
2010-08-01
The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
Synthesis of sodium polyhydrides at high pressures
NASA Astrophysics Data System (ADS)
Struzhkin, Viktor V.; Kim, Duck Young; Stavrou, Elissaios; Muramatsu, Takaki; Mao, Ho-Kwang; Pickard, Chris J.; Needs, Richard J.; Prakapenka, Vitali B.; Goncharov, Alexander F.
2016-07-01
The only known compound of sodium and hydrogen is archetypal ionic NaH. Application of high pressure is known to promote states with higher atomic coordination, but extensive searches for polyhydrides with unusual stoichiometry have had only limited success in spite of several theoretical predictions. Here we report the first observation of the formation of polyhydrides of Na (NaH3 and NaH7) above 40 GPa and 2,000 K. We combine synchrotron X-ray diffraction and Raman spectroscopy in a laser-heated diamond anvil cell and theoretical random structure searching, which both agree on the stable structures and compositions. Our results support the formation of multicenter bonding in a material with unusual stoichiometry. These results are applicable to the design of new energetic solids and high-temperature superconductors based on hydrogen-rich materials.
NASA Astrophysics Data System (ADS)
Trimarchi, Giancarlo; Zhang, Xiuwen; DeVries Vermeer, Michael J.; Cantwell, Jacqueline; Poeppelmeier, Kenneth R.; Zunger, Alex
2015-10-01
Theoretical sorting of stable and synthesizable "missing compounds" from those that are unstable is a crucial step in the discovery of previously unknown functional materials. This active research area often involves high-throughput (HT) examination of the total energy of a given compound in a list of candidate formal structure types (FSTs), searching for those with the lowest energy within that list. While it is well appreciated that local relaxation methods based on a fixed list of structure types can lead to inaccurate geometries, this approach is widely used in HT studies because it produces answers faster than global optimization methods (that vary lattice vectors and atomic positions without local restrictions). We find, however, a different failure mode of the HT protocol: specific crystallographic classes of formal structure types each correspond to a series of chemically distinct "daughter structure types" (DSTs) that have the same space group but possess totally different local bonding configurations, including coordination types. Failure to include such DSTs in the fixed list of examined candidate structures used in contemporary high-throughput approaches can lead to qualitative misidentification of the stable bonding pattern, not just quantitative inaccuracies. In this work, we (i) clarify the understanding of the general DST-FST relationship, thus improving current discovery HT approaches, (ii) illustrate this failure mode for RbCuS and RbCuSe (the latter being a yet unreported compound and is predicted here) by developing a synthesis method and accelerated crystal-structure determination, and (iii) apply the genetic-algorithm-based global space-group optimization (GSGO) approach which is not vulnerable to the failure mode of HT searches of fixed lists, demonstrating a correct identification of the stable DST. The broad impact of items (i)-(iii) lies in the demonstrated predictive ability of a more comprehensive search strategy than what is currently used—use HT calculations as the preliminary broad screening followed by unbiased GSGO of the final candidates.
Perspective: Role of structure prediction in materials discovery and design
NASA Astrophysics Data System (ADS)
Needs, Richard J.; Pickard, Chris J.
2016-05-01
Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.
Protein structural similarity search by Ramachandran codes
Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang
2007-01-01
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377
TOUCHSTONE II: a new approach to ab initio protein structure prediction.
Zhang, Yang; Kolinski, Andrzej; Skolnick, Jeffrey
2003-08-01
We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.
Local-search based prediction of medical image registration error
NASA Astrophysics Data System (ADS)
Saygili, Görkem
2018-03-01
Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
The Pricing of Information--A Search-Based Approach to Pricing an Online Search Service.
ERIC Educational Resources Information Center
Boyle, Harry F.
1982-01-01
Describes innovative pricing structure consisting of low connect time fee, print fees, and search fees, offered by Chemical Abstracts Service (CAS) ONLINE--an online searching system used to locate chemical substances. Pricing options considered by CAS, the search-based pricing approach, and users' reactions to pricing structures are noted. (EJS)
Quantum Monte Carlo study of the phase diagram of solid molecular hydrogen at extreme pressures
Drummond, N. D.; Monserrat, Bartomeu; Lloyd-Williams, Jonathan H.; Ríos, P. López; Pickard, Chris J.; Needs, R. J.
2015-01-01
Establishing the phase diagram of hydrogen is a major challenge for experimental and theoretical physics. Experiment alone cannot establish the atomic structure of solid hydrogen at high pressure, because hydrogen scatters X-rays only weakly. Instead, our understanding of the atomic structure is largely based on density functional theory (DFT). By comparing Raman spectra for low-energy structures found in DFT searches with experimental spectra, candidate atomic structures have been identified for each experimentally observed phase. Unfortunately, DFT predicts a metallic structure to be energetically favoured at a broad range of pressures up to 400 GPa, where it is known experimentally that hydrogen is non-metallic. Here we show that more advanced theoretical methods (diffusion quantum Monte Carlo calculations) find the metallic structure to be uncompetitive, and predict a phase diagram in reasonable agreement with experiment. This greatly strengthens the claim that the candidate atomic structures accurately model the experimentally observed phases. PMID:26215251
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-03-10
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-01-01
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752
Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias
2017-11-10
Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
SA-Search: a web tool for protein structure mining based on a Structural Alphabet
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-01-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search. PMID:15215446
SA-Search: a web tool for protein structure mining based on a Structural Alphabet.
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-07-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search.
Crystal Structure and Superconductivity of PH 3 at High Pressures
Liu, Hanyu; Li, Yinwei; Gao, Guoying; ...
2016-01-20
Here, we performed systematic structure search on solid PH 3 at high pressures using particle swarm optimization method. Furthermore, at 100-200 GPa, the search led to two structures consisting of P-P bonds that different from these predicted for H 2S. Phonon and electron-phonon calculations indicate both structures are dynamically stable and superconductive. Particularly, the estimated critical temperature for the monoclinic (C2/m) phase of 83 K at 200 GPa is in excellent agreement with a recent experimental report.
RRCRank: a fusion method using rank strategy for residue-residue contact prediction.
Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian
2017-09-02
In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.
Recognition of functional sites in protein structures.
Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J
2004-06-04
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.
Towards the design of novel cuprate-based superconductors
NASA Astrophysics Data System (ADS)
Yee, Chuck-Hou
The rapid maturation of materials databases combined with recent development of theories seeking to quantitatively link chemical properties to superconductivity in the cuprates provide the context to design novel superconductors. In this talk, we describe a framework designed to search for new superconductors, which combines chemical rules-of-thumb, insights of transition temperatures from dynamical mean-field theory, first-principles electronic structure tools, materials databases and structure prediction via evolutionary algorithms. We apply the framework to design a family of copper oxysulfides and evaluate the prospects of superconductivity.
GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms.
Moraes, João P A; Pappa, Gisele L; Pires, Douglas E V; Izidoro, Sandro C
2017-07-03
Enzyme active sites are important and conserved functional regions of proteins whose identification can be an invaluable step toward protein function prediction. Most of the existing methods for this task are based on active site similarity and present limitations including performing only exact matches on template residues, template size restraints, despite not being capable of finding inter-domain active sites. To fill this gap, we proposed GASS-WEB, a user-friendly web server that uses GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for similar active sites in proteins. GASS-WEB can be used under two different scenarios: (i) given a protein of interest, to match a set of specific active site templates; or (ii) given an active site template, looking for it in a database of protein structures. The method has shown to be very effective on a range of experiments and was able to correctly identify >90% of the catalogued active sites from the Catalytic Site Atlas. It also managed to achieve a Matthew correlation coefficient of 0.63 using the Critical Assessment of protein Structure Prediction (CASP 10) dataset. In our analysis, GASS was ranking fourth among 18 methods. GASS-WEB is freely available at http://gass.unifei.edu.br/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso
2013-07-30
This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.
Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David
2001-01-01
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Myint, Kyaw Z.; Xie, Xiang-Qun
2015-01-01
This chapter focuses on the fingerprint-based artificial neural networks QSAR (FANN-QSAR) approach to predict biological activities of structurally diverse compounds. Three types of fingerprints, namely ECFP6, FP2, and MACCS, were used as inputs to train the FANN-QSAR models. The results were benchmarked against known 2D and 3D QSAR methods, and the derived models were used to predict cannabinoid (CB) ligand binding activities as a case study. In addition, the FANN-QSAR model was used as a virtual screening tool to search a large NCI compound database for lead cannabinoid compounds. We discovered several compounds with good CB2 binding affinities ranging from 6.70 nM to 3.75 μM. The studies proved that the FANN-QSAR method is a useful approach to predict bioactivities or properties of ligands and to find novel lead compounds for drug discovery research. PMID:25502380
2014-01-01
Background Alternative splicing is an important process in higher eukaryotes that allows obtaining several transcripts from one gene. A specific case of alternative splicing is mutually exclusive splicing, in which exactly one exon out of a cluster of neighbouring exons is spliced into the mature transcript. Recently, a new algorithm for the prediction of these exons has been developed based on the preconditions that the exons of the cluster have similar lengths, sequence homology, and conserved splice sites, and that they are translated in the same reading frame. Description In this contribution we introduce Kassiopeia, a database and web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exomes. Currently, Kassiopeia provides access to the mutually exclusive exomes of twelve Drosophila species, the thale cress Arabidopsis thaliana, the flatworm Caenorhabditis elegans, and human. Mutually exclusive spliced exons (MXEs) were predicted based on gene reconstructions from Scipio. Based on the standard prediction values, with which 83.5% of the annotated MXEs of Drosophila melanogaster were reconstructed, the exomes contain surprisingly more MXEs than previously supposed and identified. The user can search Kassiopeia using BLAST or browse the genes of each species optionally adjusting the parameters used for the prediction to reveal more divergent or only very similar exon candidates. Conclusions We developed a pipeline to predict MXEs in the genomes of several model organisms and a web interface, Kassiopeia, for their visualization. For each gene Kassiopeia provides a comprehensive gene structure scheme, the sequences and predicted secondary structures of the MXEs, and, if available, further evidence for MXE candidates from cDNA/EST data, predictions of MXEs in homologous genes of closely related species, and RNA secondary structure predictions. Kassiopeia can be accessed at http://www.motorprotein.de/kassiopeia. PMID:24507667
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
2016-10-07
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
Computational Discovery of New Materials Under Pressure
NASA Astrophysics Data System (ADS)
Zurek, Eva
The pressure variable opens the door towards the synthesis of materials with unique properties, ie. superconductivity, hydrogen storage media, high-energy density and superhard materials, to name a few. Indeed, recently superconductivity has been observed below 203 K and 103 K in samples of compressed sulfur dihydride and phosphine, respectively. Under pressure elements that would not normally combine may form stable compounds, or may mix in novel proportions. As a result using our chemical intuition developed at 1 atm to theoretically predict stable phases is bound to fail. In order to enable our search for superconducting hydrogen-rich systems under pressure, we have developed XtalOpt, an open-source evolutionary algorithm for crystal structure prediction. New advances in XtalOpt that enable the prediction of unit cells with greater complexity will be described. XtalOpt has been employed to find the most stable structures of hydrides with unique stoichiometries under pressure. The electronic structure and bonding of the predicted phases has been analyzed by detailed first-principles calculations based on density functional theory. The results of our computational experiments are helping us to build chemical and physical intuition for compressed solids.
Synthesis of sodium polyhydrides at high pressures
Struzhkin, Viktor V.; Kim, Duck Young; Stavrou, Elissaios; ...
2016-07-28
Archetypal ionic NaH is the only known compound of sodium and hydrogen. Application of high pressure is known to promote states with higher atomic coordination, but extensive searches for polyhydrides with unusual stoichiometry have had only limited success in spite of several theoretical predictions. Here we report the first observation of the formation of polyhydrides of Na (NaH 3 and NaH 7) above 40 GPa and 2,000 K. Moreover, we combine synchrotron X-ray diffraction and Raman spectroscopy in a laser-heated diamond anvil cell and theoretical random structure searching, which both agree on the stable structures and compositions. Our results supportmore » the formation of multicenter bonding in a material with unusual stoichiometry. These results are applicable to the design of new energetic solids and high-temperature superconductors based on hydrogen-rich materials.« less
Ponting, C P; Mott, R; Bork, P; Copley, R R
2001-12-01
Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.
Violence risk prediction. Clinical and actuarial measures and the role of the Psychopathy Checklist.
Dolan, M; Doyle, M
2000-10-01
Violence risk prediction is a priority issue for clinicians working with mentally disordered offenders. To review the current status of violence risk prediction research. Literature search (Medline). Key words: violence, risk prediction, mental disorder. Systematic/structured risk assessment approaches may enhance the accuracy of clinical prediction of violent outcomes. Data on the predictive validity of available clinical risk assessment tools are based largely on American and North American studies and further validation is required in British samples. The Psychopathy Checklist appears to be a key predictor of violent recidivism in a variety of settings. Violence risk prediction is an inexact science and as such will continue to provoke debate. Clinicians clearly need to be able to demonstrate the rationale behind their decisions on violence risk and much can be learned from recent developments in research on violence risk prediction.
Improved packing of protein side chains with parallel ant colonies
2014-01-01
Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164
NASA Astrophysics Data System (ADS)
Yin, Lucy; Andrews, Jennifer; Heaton, Thomas
2018-05-01
Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
HATAKEYAMA, YOSHINORI; SHIBUYA, NORIHIRO; NISHIYAMA, TAKASHI; NAKASHIMA, NOBUHIKO
2004-01-01
The intergenic region (IGR) located upstream of the capsid protein gene in dicistroviruses contains an internal ribosome entry site (IRES). Translation initiation mediated by the IRES does not require initiator methionine tRNA. Comparison of the IGRs among dicistroviruses suggested that Taura syndrome virus (TSV) and acute bee paralysis virus have an extra side stem loop in the predicted IRES. We examined whether the side stem is responsible for translation activity mediated by the IGR using constructs with compensatory mutations. In vitro translation analysis showed that TSV has an IGR-IRES that is structurally distinct from those previously described. Because IGR-IRES elements determine the translation initiation site by virtue of their own tertiary structure formation, the discovery of this initiation mechanism suggests the possibility that eukaryotic mRNAs might have more extensive coding regions than previously predicted. To test this hypothesis, we searched full-length cDNA databases and whole genome sequences of eukaryotes using the pattern matching program, Scan For Matches, with parameters that can extract sequences containing secondary structure elements resembling those of IGR-IRES. Our search yielded several sequences, but their predicted secondary structures were suggested to be unstable in comparison to those of dicistroviruses. These results suggest that RNAs structurally similar to dicistroviruses are not common. If some eukaryotic mRNAs are translated independently of an initiator methionine tRNA, their structures are likely to be significantly distinct from those of dicistroviruses. PMID:15100433
First-principles theory of cation and intercalation ordering in Li xCoO 2
NASA Astrophysics Data System (ADS)
Wolverton, C.; Zunger, Alex
Several types of cation- and vacancy-ordering are of interest in the Li xCoO 2 battery cathode material since they can have a profound effect on the battery voltage. We present a first-principles theoretical approach which can be used to calculate both cation- and vacancy-ordering patterns at both zero and finite temperatures. This theory also provides quantum-mechanical predictions (i.e., without the use of any experimental input) of battery voltages of both ordered and disordered Li xCoO 2/Li cells from the energetics of the Li intercalation reactions. Our calculations allow us to search the entire configurational space to predict the lowest-energy ground-state structures, search for large voltage cathodes, explore metastable low-energy states, and extend our calculations to finite temperatures, thereby searching for order-disorder transitions and states of partial disorder. We present the first prediction of the stable spinel structure LiCo 2O 4 for the 50% delithiated Li 0.5CoO 2.
Monk, Christopher T; Barbier, Matthieu; Romanczuk, Pawel; Watson, James R; Alós, Josep; Nakayama, Shinnosuke; Rubenstein, Daniel I; Levin, Simon A; Arlinghaus, Robert
2018-06-01
Understanding how humans and other animals behave in response to changes in their environments is vital for predicting population dynamics and the trajectory of coupled social-ecological systems. Here, we present a novel framework for identifying emergent social behaviours in foragers (including humans engaged in fishing or hunting) in predator-prey contexts based on the exploration difficulty and exploitation potential of a renewable natural resource. A qualitative framework is introduced that predicts when foragers should behave territorially, search collectively, act independently or switch among these states. To validate it, we derived quantitative predictions from two models of different structure: a generic mathematical model, and a lattice-based evolutionary model emphasising exploitation and exclusion costs. These models independently identified that the exploration difficulty and exploitation potential of the natural resource controls the social behaviour of resource exploiters. Our theoretical predictions were finally compared to a diverse set of empirical cases focusing on fisheries and aquatic organisms across a range of taxa, substantiating the framework's predictions. Understanding social behaviour for given social-ecological characteristics has important implications, particularly for the design of governance structures and regulations to move exploited systems, such as fisheries, towards sustainability. Our framework provides concrete steps in this direction. © 2018 John Wiley & Sons Ltd/CNRS.
Structure and electronic properties of azadirachtin.
de Castro, Elton A S; de Oliveira, Daniel A B; Farias, Sergio A S; Gargano, Ricardo; Martins, João B L
2014-02-01
We performed a combined DFT and Monte Carlo (13)C NMR chemical-shift study of azadirachtin A, a triterpenoid that acts as a natural insect antifeedant. A conformational search using a Monte Carlo technique based on the RM1 semiempirical method was carried out in order to establish its preferred structure. The B3LYP/6-311++G(d,p), wB97XD/6-311++G(d,p), M06/6-311++G(d,p), M06-2X/6-311++G(d,p), and CAM-B3LYP/6-311++G(d,p) levels of theory were used to predict NMR chemical shifts. A Monte Carlo population-weighted average spectrum was produced based on the predicted Boltzmann contributions. In general, good agreement between experimental and theoretical data was obtained using both methods, and the (13)C NMR chemical shifts were predicted highly accurately. The geometry was optimized at the semiempirical level and used to calculate the NMR chemical shifts at the DFT level, and these shifts showed only minor deviations from those obtained following structural optimization at the DFT level, and incurred a much lower computational cost. The theoretical ultraviolet spectrum showed a maximum absorption peak that was mainly contributed by the tiglate group.
Physics-Based Hazard Assessment for Critical Structures Near Large Earthquake Sources
NASA Astrophysics Data System (ADS)
Hutchings, L.; Mert, A.; Fahjan, Y.; Novikova, T.; Golara, A.; Miah, M.; Fergany, E.; Foxall, W.
2017-09-01
We argue that for critical structures near large earthquake sources: (1) the ergodic assumption, recent history, and simplified descriptions of the hazard are not appropriate to rely on for earthquake ground motion prediction and can lead to a mis-estimation of the hazard and risk to structures; (2) a physics-based approach can address these issues; (3) a physics-based source model must be provided to generate realistic phasing effects from finite rupture and model near-source ground motion correctly; (4) wave propagations and site response should be site specific; (5) a much wider search of possible sources of ground motion can be achieved computationally with a physics-based approach; (6) unless one utilizes a physics-based approach, the hazard and risk to structures has unknown uncertainties; (7) uncertainties can be reduced with a physics-based approach, but not with an ergodic approach; (8) computational power and computer codes have advanced to the point that risk to structures can be calculated directly from source and site-specific ground motions. Spanning the variability of potential ground motion in a predictive situation is especially difficult for near-source areas, but that is the distance at which the hazard is the greatest. The basis of a "physical-based" approach is ground-motion syntheses derived from physics and an understanding of the earthquake process. This is an overview paper and results from previous studies are used to make the case for these conclusions. Our premise is that 50 years of strong motion records is insufficient to capture all possible ranges of site and propagation path conditions, rupture processes, and spatial geometric relationships between source and site. Predicting future earthquake scenarios is necessary; models that have little or no physical basis but have been tested and adjusted to fit available observations can only "predict" what happened in the past, which should be considered description as opposed to prediction. We have developed a methodology for synthesizing physics-based broadband ground motion that incorporates the effects of realistic earthquake rupture along specific faults and the actual geology between the source and site.
2013-01-01
Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives. PMID:23889801
Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S
2015-11-13
The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. Copyright © 2015 Elsevier B.V. All rights reserved.
Speech coding at low to medium bit rates
NASA Astrophysics Data System (ADS)
Leblanc, Wilfred Paul
1992-09-01
Improved search techniques coupled with improved codebook design methodologies are proposed to improve the performance of conventional code-excited linear predictive coders for speech. Improved methods for quantizing the short term filter are developed by employing a tree search algorithm and joint codebook design to multistage vector quantization. Joint codebook design procedures are developed to design locally optimal multistage codebooks. Weighting during centroid computation is introduced to improve the outlier performance of the multistage vector quantizer. Multistage vector quantization is shown to be both robust against input characteristics and in the presence of channel errors. Spectral distortions of about 1 dB are obtained at rates of 22-28 bits/frame. Structured codebook design procedures for excitation in code-excited linear predictive coders are compared to general codebook design procedures. Little is lost using significant structure in the excitation codebooks while greatly reducing the search complexity. Sparse multistage configurations are proposed for reducing computational complexity and memory size. Improved search procedures are applied to code-excited linear prediction which attempt joint optimization of the short term filter, the adaptive codebook, and the excitation. Improvements in signal to noise ratio of 1-2 dB are realized in practice.
Nguyen, Q Nhu N; Schwochert, Joshua; Tantillo, Dean J; Lokey, R Scott
2018-05-10
Solving conformations of cyclic peptides can provide insight into structure-activity and structure-property relationships, which can help in the design of compounds with improved bioactivity and/or ADME characteristics. The most common approaches for determining the structures of cyclic peptides are based on NMR-derived distance restraints obtained from NOESY or ROESY cross-peak intensities, and 3J-based dihedral restraints using the Karplus relationship. Unfortunately, these observables are often too weak, sparse, or degenerate to provide unequivocal, high-confidence solution structures, prompting us to investigate an alternative approach that relies only on 1H and 13C chemical shifts as experimental observables. This method, which we call conformational analysis from NMR and density-functional prediction of low-energy ensembles (CANDLE), uses molecular dynamics (MD) simulations to generate conformer families and density functional theory (DFT) calculations to predict their 1H and 13C chemical shifts. Iterative conformer searches and DFT energy calculations on a cyclic peptide-peptoid hybrid yielded Boltzmann ensembles whose predicted chemical shifts matched the experimental values better than any single conformer. For these compounds, CANDLE outperformed the classic NOE- and 3J-coupling-based approach by disambiguating similar β-turn types and also enabled the structural elucidation of the minor conformer. Through the use of chemical shifts, in conjunction with DFT and MD calculations, CANDLE can help illuminate conformational ensembles of cyclic peptides in solution.
Contrast model for three-dimensional vehicles in natural lighting and search performance analysis
NASA Astrophysics Data System (ADS)
Witus, Gary; Gerhart, Grant R.; Ellis, R. Darin
2001-09-01
Ground vehicles in natural lighting tend to have significant and systematic variation in luminance through the presented area. This arises, in large part, from the vehicle surfaces having different orientations and shadowing relative to the source of illumination and the position of the observer. These systematic differences create the appearance of a structured 3D object. The 3D appearance is an important factor in search, figure-ground segregation, and object recognition. We present a contrast metric to predict search and detection performance that accounts for the 3D structure. The approach first computes the contrast of the front (or rear), side, and top surfaces. The vehicle contrast metric is the area-weighted sum of the absolute values of the contrasts of the component surfaces. The 3D structure contrast metric, together with target height, account for more than 80% of the variance in probability of detection and 75% of the variance in search time. When false alarm effects are discounted, they account for 89% of the variance in probability of detection and 95% of the variance in search time. The predictive power of the signature metric, when calibrated to half the data and evaluated against the other half, is 90% of the explanatory power.
NASA Astrophysics Data System (ADS)
Ivanova, Julia
2014-05-01
The complexity of any task solving, including tasks in the Earth Sciences, depends on the completeness of the information that is available. The prediction of the mineralization zone localization is a task with incomplete information. The tasks of prediction are complicated because of search data difficult formalize, and the absent of single information structures of the representation of the search data. These facts complicate the process of structuring, processing and analysis of information. Geodata that need to process are presented in various formats: raster two-dimensional and three-dimensional fields, vector layers of polygons and lines, point markable layers, the spectral and discrete, quantized and continuous, analog and digital forms, as well as chemical formalization. In this form representative data cannot be combining into superclasses. At the same time the information content of geodata that are applied individually is very small. While a number of low informative features, which can be obtained in the process of research of mineralization zones are usually redundant. As a result the quality of knowledge that can be obtained from the search data decreases, as well as the technological cycle of information processing increases. Additionally, that leads to exploitation of datasets, and production of large shared datasets [1]. To solve efficiently the tasks of predicting, it is necessary to use union heterogeneous search features, accumulated factual data and modern science-based mathematical apparatus of processing and analysis of the information. As well young branches of human knowledge help to solve this task: remote sensing, geoinformatics, Earth and Space Science Informatics [2], apparatus of catastrophe theory and nonlinear dynamics, game theory. The purpose of the suggested approach is to increase informational content, and to reduce of geodata redundancy to improve the accuracy of the prediction of mineralization zones. The developed algorithm of prediction of the localization of mineralization zone consists of the some steps: 1. The collection of information about the studying territory of upcoming work from various sources, i.e. building of database (DB). The DB includes variety geodata. 2. The formalization, the concatenation and the union of geodata. Study of features correlation characteristics. Generation of new formal and functional search features. 3. The formation of a number of hypotheses based on initial data. The refinement of search features. 4. Preliminary mathematical modeling of prospective mineralized zones. The study of obtained results, the formation of additional features list. 5. The collection of additional features by field methods for verifying of hypotheses. 6. Processing and analyzing of obtaining data, the specification of preliminary mathematical model. 7. The examination of hypotheses using the obtained results. The study of prediction errors. 8. Building of multidimensional risk matrices of detection and bifurcation diagrams of mineralization [3]. 9. The final mathematical modeling of perspective mineralized zones. Thus, the proposed approach allows to increase the information content of geodata significantly, to reduce redundancy of geodate, and to increase the accuracy of predicting zones of gold mineralization. Currently the approach, suggested by the author, applies for prediction of the localization of gold mineralization at the territory of the Polar Urals. References: 1.W. J. Som de Cerff, M. Petitdidier, A. Gemünd, L. Horstink, H. Schwichtenberg, Earth Science Test Suites to Evaluate Grid Tools and Middleware-Examples for Grid Data Access Tools, Earth Science Informatics, Vol. 2, 117-131, 2009. DOI 10.1007/s12145-009-0022-y. 2. P. Mazzetti , S. Nativi, J. Caron, RESTfulI implementation of Geospatial, Services for Earth and Space Science Applications, International Journal of Digital Earth, Vol. 2, Supplement 1, 40-61, 2009. DOI: 10.1080/175389409028661532. 3. Arnold V.I., Catastrophe Theory, 4th ed. Moscow, Editorial-URSS (2004), ISBN 5-354-00674-0 (in Russian).
Development of an Evolutionary Algorithm for the ab Initio Discovery of Two-Dimensional Materials
NASA Astrophysics Data System (ADS)
Revard, Benjamin Charles
Crystal structure prediction is an important first step on the path toward computational materials design. Increasingly robust methods have become available in recent years for computing many materials properties, but because properties are largely a function of crystal structure, the structure must be known before these methods can be brought to bear. In addition, structure prediction is particularly useful for identifying low-energy structures of subperiodic materials, such as two-dimensional (2D) materials, which may adopt unexpected structures that differ from those of the corresponding bulk phases. Evolutionary algorithms, which are heuristics for global optimization inspired by biological evolution, have proven to be a fruitful approach for tackling the problem of crystal structure prediction. This thesis describes the development of an improved evolutionary algorithm for structure prediction and several applications of the algorithm to predict the structures of novel low-energy 2D materials. The first part of this thesis contains an overview of evolutionary algorithms for crystal structure prediction and presents our implementation, including details of extending the algorithm to search for clusters, wires, and 2D materials, improvements to efficiency when running in parallel, improved composition space sampling, and the ability to search for partial phase diagrams. We then present several applications of the evolutionary algorithm to 2D systems, including InP, the C-Si and Sn-S phase diagrams, and several group-IV dioxides. This thesis makes use of the Cornell graduate school's "papers" option. Chapters 1 and 3 correspond to the first-author publications of Refs. [131] and [132], respectively, and chapter 2 will soon be submitted as a first-author publication. The material in chapter 4 is taken from Ref. [144], in which I share joint first-authorship. In this case I have included only my own contributions.
Kaindl, H; Kainz, G; Radda, K
2001-01-01
Most of the work on search in artificial intelligence (AI) deals with one search direction only-mostly forward search-although it is known that a structural asymmetry of the search graph causes differences in the efficiency of searching in the forward or the backward direction, respectively. In the case of symmetrical graph structure, however, current theory would not predict such differences in efficiency. In several classes of job sequencing problems, we observed a phenomenon of asymmetry in search that relates to the distribution of the are costs in the search graph. This phenomenon can be utilized for improving the search efficiency by a new algorithm that automatically selects the search direction. We demonstrate fur a class of job sequencing problems that, through the utilization of this phenomenon, much more difficult problems can be solved-according to our best knowledge-than by the best published approach, and on the same problems, the running time is much reduced. As a consequence, we propose to check given problems for asymmetrical distribution of are costs that may cause asymmetry in search.
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.
Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de
2006-03-31
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
INFO-RNA--a fast approach to inverse RNA folding.
Busch, Anke; Backofen, Rolf
2006-08-01
The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. www.bioinf.uni-freiburg.de?Subpages/software.html.
NASA Astrophysics Data System (ADS)
Faucci, Maria Teresa; Melani, Fabrizio; Mura, Paola
2002-06-01
Molecular modeling was used to investigate factors influencing complex formation between cyclodextrins and guest molecules and predict their stability through a theoretical model based on the search for a correlation between experimental stability constants ( Ks) and some theoretical parameters describing complexation (docking energy, host-guest contact surfaces, intermolecular interaction fields) calculated from complex structures at a minimum conformational energy, obtained through stochastic methods based on molecular dynamic simulations. Naproxen, ibuprofen, ketoprofen and ibuproxam were used as model drug molecules. Multiple Regression Analysis allowed identification of the significant factors for the complex stability. A mathematical model ( r=0.897) related log Ks with complex docking energy and lipophilic molecular fields of cyclodextrin and drug.
Application of the AMPLE cluster-and-truncate approach to NMR structures for molecular replacement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bibby, Jaclyn; Keegan, Ronan M.; Mayans, Olga
2013-11-01
Processing of NMR structures for molecular replacement by AMPLE works well. AMPLE is a program developed for clustering and truncating ab initio protein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models. Rosetta remodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on using AMPLE are provided.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching
NASA Astrophysics Data System (ADS)
Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.
Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Searching molecular structure databases with tandem mass spectra using CSI:FingerID
Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian
2015-01-01
Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543
Exploration of tetrahedral structures in silicate cathodes using a motif-network scheme
Zhao, Xin; Wu, Shunqing; Lv, Xiaobao; Nguyen, Manh Cuong; Wang, Cai-Zhuang; Lin, Zijing; Zhu, Zi-Zhong; Ho, Kai-Ming
2015-01-01
Using a motif-network search scheme, we studied the tetrahedral structures of the dilithium/disodium transition metal orthosilicates A2MSiO4 with A = Li or Na and M = Mn, Fe or Co. In addition to finding all previously reported structures, we discovered many other different tetrahedral-network-based crystal structures which are highly degenerate in energy. These structures can be classified into structures with 1D, 2D and 3D M-Si-O frameworks. A clear trend of the structural preference in different systems was revealed and possible indicators that affect the structure stabilities were introduced. For the case of Na systems which have been much less investigated in the literature relative to the Li systems, we predicted their ground state structures and found evidence for the existence of new structural motifs. PMID:26497381
Ivanciuc, Ovidiu
2013-06-01
Chemical and molecular graphs have fundamental applications in chemoinformatics, quantitative structureproperty relationships (QSPR), quantitative structure-activity relationships (QSAR), virtual screening of chemical libraries, and computational drug design. Chemoinformatics applications of graphs include chemical structure representation and coding, database search and retrieval, and physicochemical property prediction. QSPR, QSAR and virtual screening are based on the structure-property principle, which states that the physicochemical and biological properties of chemical compounds can be predicted from their chemical structure. Such structure-property correlations are usually developed from topological indices and fingerprints computed from the molecular graph and from molecular descriptors computed from the three-dimensional chemical structure. We present here a selection of the most important graph descriptors and topological indices, including molecular matrices, graph spectra, spectral moments, graph polynomials, and vertex topological indices. These graph descriptors are used to define several topological indices based on molecular connectivity, graph distance, reciprocal distance, distance-degree, distance-valency, spectra, polynomials, and information theory concepts. The molecular descriptors and topological indices can be developed with a more general approach, based on molecular graph operators, which define a family of graph indices related by a common formula. Graph descriptors and topological indices for molecules containing heteroatoms and multiple bonds are computed with weighting schemes based on atomic properties, such as the atomic number, covalent radius, or electronegativity. The correlation in QSPR and QSAR models can be improved by optimizing some parameters in the formula of topological indices, as demonstrated for structural descriptors based on atomic connectivity and graph distance.
A classification of ecological boundaries
Strayer, D.L.; Power, M.E.; Fagan, W.F.; Pickett, S.T.A.; Belnap, J.
2003-01-01
Ecologists use the term boundary to refer to a wide range of real and conceptual structures. Because imprecise terminology may impede the search for general patterns and theories about ecological boundaries, we present a classification of the attributes of ecological boundaries to aid in communication and theory development. Ecological boundaries may differ in their origin and maintenance, their spatial structure, their function, and their temporal dynamics. A classification system based on these attributes should help ecologists determine whether boundaries are truly comparable. This system can be applied when comparing empirical studies, comparing theories, and testing theoretical predictions against empirical results.
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.
Issac, Biju; Raghava, G P S
2002-09-01
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Empirical scoring functions for advanced protein-ligand docking with PLANTS.
Korb, Oliver; Stützle, Thomas; Exner, Thomas E
2009-01-01
In this paper we present two empirical scoring functions, PLANTS(CHEMPLP) and PLANTS(PLP), designed for our docking algorithm PLANTS (Protein-Ligand ANT System), which is based on ant colony optimization (ACO). They are related, regarding their functional form, to parts of already published scoring functions and force fields. The parametrization procedure described here was able to identify several parameter settings showing an excellent performance for the task of pose prediction on two test sets comprising 298 complexes in total. Up to 87% of the complexes of the Astex diverse set and 77% of the CCDC/Astex clean listnc (noncovalently bound complexes of the clean list) could be reproduced with root-mean-square deviations of less than 2 A with respect to the experimentally determined structures. A comparison with the state-of-the-art docking tool GOLD clearly shows that this is, especially for the druglike Astex diverse set, an improvement in pose prediction performance. Additionally, optimized parameter settings for the search algorithm were identified, which can be used to balance pose prediction reliability and search speed.
Structure-based CoMFA as a predictive model - CYP2C9 inhibitors as a test case.
Yasuo, Kazuya; Yamaotsu, Noriyuki; Gouda, Hiroaki; Tsujishita, Hideki; Hirono, Shuichi
2009-04-01
In this study, we tried to establish a general scheme to create a model that could predict the affinity of small compounds to their target proteins. This scheme consists of a search for ligand-binding sites on a protein, a generation of bound conformations (poses) of ligands in each of the sites by docking, identifications of the correct poses of each ligand by consensus scoring and MM-PBSA analysis, and a construction of a CoMFA model with the obtained poses to predict the affinity of the ligands. By using a crystal structure of CYP 2C9 and the twenty known CYP inhibitors as a test case, we obtained a CoMFA model with a good statistics, which suggested that the classification of the binding sites as well as the predicted bound poses of the ligands should be reasonable enough. The scheme described here would give a method to predict the affinity of small compounds with a reasonable accuracy, which is expected to heighten the value of computational chemistry in the drug design process.
Spatial Query for Planetary Data
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.
2011-01-01
Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frolov, T.; Setyawan, W.; Kurtz, R. J.
Evolutionary grand-canonical search predicts novel grain boundary structures and multiple grain boundary phases in elemental body-centered cubic (bcc) metals represented by tungsten, tantalum and molybdenum.
Crystal Structure Prediction and its Application in Earth and Materials Sciences
NASA Astrophysics Data System (ADS)
Zhu, Qiang
First of all, we describe how to predict crystal structure by evolutionary approach, and extend this method to study the packing of organic molecules, by our specially designed constrained evolutionary algorithm. The main feature of this new approach is that each unit or molecule is treated as a whole body, which drastically reduces the search space and improves the efficiency. The improved method is possibly to be applied in the fields of (1) high pressure phase of simple molecules (H2O, NH3, CH4, etc); (2) pharmaceutical molecules (glycine, aspirin, etc); (3) complex inorganic crystals containing cluster or molecular unit, (Mg(BH4)2, Ca(BH4)2, etc). One application of the constrained evolutionary algorithm is given by the study of (Mg(BH4)2, which is a promising materials for hydrogen storage. Our prediction does not only reproduce the previous work on Mg(BH4)2 at ambient condition, but also yields two new tetragonal structures at high pressure, with space groups P4 and I41/acd are predicted to be lower in enthalpy, by 15.4 kJ/mol and 21.2 kJ/mol, respectively, than the earlier proposed P42nm phase. We have simulated X-ray diffraction spectra, lattice dynamics, and equations of state of these phases. The density, volume contraction, bulk modulus, and the simulated XRD patterns of P4 and I41/acd structures are in excellent agreement with the experimental results. Two kinds of oxides (Xe-O and Mg-O) have been studied under megabar pressures. For XeO, we predict the existence of thermodynamically stable Xe-O compounds at high pressures (XeO, XeO2 and XeO3 become stable at pressures of 83, 102 and 114 GPa, respectively). For Mg-O, our calculations find that two extraordinary compounds MgO2 and Mg3O 2 become thermodynamically stable at 116 GPa and 500 GPa, respectively. Our calculations indicate large charge transfer in these oxides for both systems, suggesting that large electronegativity difference and pressure are the key factors favouring their formations. We also discuss if these oxides might exist at earth and planetary conditions. If the target properties are set as the global fitness functions while structure relaxations are energy/enthalpy minimization, such hybrid optimization technique could effectively explore the landscape of properties for the given systems. Here we illustrate this function by the case of searching for superdense carbon allotropes. We find three structures (hP3, tI12, and tP12) that have significantly greater density. Furthermore, we find a collection of other superdense structures based on different ways of packing carbon tetrahedral. Superdense carbon allotropes are predicted to have remarkably high refractive indices and strong dispersion of light. Apart from evolutionary approach, there also exist some other methods for structural prediction. One can also combine the features from different methods. We develop a novel method for crystal structure prediction, based on metadynamics and evolutionary algorithms. This technique can be used to produce efficiently both the ground state and metastable states easily reachable from a reasonable initial structure. We use the cell shape as collective variable and evolutionary variation operators developed in the context of the USPEX method to equilibrate the system as a function of the collective variables. We illustrate how this approach helps one to find stable and metastable states for Al2SiO5, SiO2, MgSiO3. Apart from predicting crystal structures, the new method can also provide insight into mechanisms of phase transitions. This method is especially powerful in sampling the metastable structures from a given configuration. Experiments on cold compression indicated the existence of a new superhard carbon allotrope. Numerous metastable candidate structures featuring different topologies have been proposed for this allotrope. We use evolutionary metadynamics to systematically search for possible candidates which could be accessible from graphite. (Abstract shortened by UMI.)
2017-01-01
Cytochrome P450 aromatase (CYP19A1) plays a key role in the development of estrogen dependent breast cancer, and aromatase inhibitors have been at the front line of treatment for the past three decades. The development of potent, selective and safer inhibitors is ongoing with in silico screening methods playing a more prominent role in the search for promising lead compounds in bioactivity-relevant chemical space. Here we present a set of comprehensive binding affinity prediction models for CYP19A1 using our automated Linear Interaction Energy (LIE) based workflow on a set of 132 putative and structurally diverse aromatase inhibitors obtained from a typical industrial screening study. We extended the workflow with machine learning methods to automatically cluster training and test compounds in order to maximize the number of explained compounds in one or more predictive LIE models. The method uses protein–ligand interaction profiles obtained from Molecular Dynamics (MD) trajectories to help model search and define the applicability domain of the resolved models. Our method was successful in accounting for 86% of the data set in 3 robust models that show high correlation between calculated and observed values for ligand-binding free energies (RMSE < 2.5 kJ mol–1), with good cross-validation statistics. PMID:28776988
Features in visual search combine linearly
Pramod, R. T.; Arun, S. P.
2014-01-01
Single features such as line orientation and length are known to guide visual search, but relatively little is known about how multiple features combine in search. To address this question, we investigated how search for targets differing in multiple features (intensity, length, orientation) from the distracters is related to searches for targets differing in each of the individual features. We tested race models (based on reaction times) and co-activation models (based on reciprocal of reaction times) for their ability to predict multiple feature searches. Multiple feature searches were best accounted for by a co-activation model in which feature information combined linearly (r = 0.95). This result agrees with the classic finding that these features are separable i.e., subjective dissimilarity ratings sum linearly. We then replicated the classical finding that the length and width of a rectangle are integral features—in other words, they combine nonlinearly in visual search. However, to our surprise, upon including aspect ratio as an additional feature, length and width combined linearly and this model outperformed all other models. Thus, length and width of a rectangle became separable when considered together with aspect ratio. This finding predicts that searches involving shapes with identical aspect ratio should be more difficult than searches where shapes differ in aspect ratio. We confirmed this prediction on a variety of shapes. We conclude that features in visual search co-activate linearly and demonstrate for the first time that aspect ratio is a novel feature that guides visual search. PMID:24715328
Protein structure database search and evolutionary classification.
Yang, Jinn-Moon; Tung, Chi-Hua
2006-01-01
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
Predicting protein structures with a multiplayer online game.
Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit
2010-08-05
People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Tree decomposition based fast search of RNA structures including pseudoknots in genomes.
Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming
2005-01-01
Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
Beauvais, W; Fournié, G; Jones, B A; Cameron, A; Njeumi, F; Lubroth, J; Pfeiffer, D U
2013-11-01
Now that we are in the rinderpest post-eradication era, attention is focused on the risk of re-introduction. A semi-quantitative risk assessment identified accidental use of rinderpest virus in laboratories as the most likely cause of re-introduction. However there is little data available on the rates of laboratory biosafety breakdowns in general. In addition, any predictions based on past events are subject to various uncertainties. The aims of this study were therefore to investigate the potential usefulness of historical data for predicting the future risk of rinderpest release via laboratory biosafety breakdowns, and to investigate the impacts of the various uncertainties on these predictions. Data were collected using a worldwide online survey of laboratories, a structured search of ProMED reports and discussion with experts. A stochastic model was constructed to predict the number of laboratory biosafety breakdowns involving rinderpest that will occur over the next 10 years, based on: (1) the historical rate of biosafety breakdowns; and (2) the change in the number of laboratories that will have rinderpest virus in the next 10 years compared to historically. The search identified five breakdowns, all of which occurred during 1970-2000 and all of which were identified via discussions with experts. Assuming that our search for historical events had a sensitivity of over 60% and there has been at least a 40% reduction in the underlying risk (attributable to decreased laboratory activity post eradication) the most likely number of biosafety events worldwide was estimated to be zero over a 10 year period. However, the risk of at least one biosafety breakdown remains greater than 1 in 10,000 unless the sensitivity was at least 99% or the number of laboratories has decreased by at least 99% (based on 2000-2010 during which there were no biosafety breakdowns). Copyright © 2013 Elsevier B.V. All rights reserved.
Retrieval Interference in Syntactic Processing: The Case of Reflexive Binding in English.
Patil, Umesh; Vasishth, Shravan; Lewis, Richard L
2016-01-01
It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory-which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in-constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue-based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues.
Retrieval Interference in Syntactic Processing: The Case of Reflexive Binding in English
Patil, Umesh; Vasishth, Shravan; Lewis, Richard L.
2016-01-01
It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory—which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in—constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue-based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues. PMID:27303315
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Chakraborty, Sandipan; Ramachandran, Balaji; Basu, Soumalee
2014-10-01
Mimicking receptor flexibility during receptor-ligand binding is a challenging task in computational drug design since it is associated with a large increase in the conformational search space. In the present study, we have devised an in silico design strategy incorporating receptor flexibility in virtual screening to identify potential lead compounds as inhibitors for flexible proteins. We have considered BACE1 (β-secretase), a key target protease from a therapeutic perspective for Alzheimer's disease, as the highly flexible receptor. The protein undergoes significant conformational transitions from open to closed form upon ligand binding, which makes it a difficult target for inhibitor design. We have designed a hybrid structure-activity model containing both ligand based descriptors and energetic descriptors obtained from molecular docking based on a dataset of structurally diverse BACE1 inhibitors. An ensemble of receptor conformations have been used in the docking study, further improving the prediction ability of the model. The designed model that shows significant prediction ability judged by several statistical parameters has been used to screen an in house developed 3-D structural library of 731 phytochemicals. 24 highly potent, novel BACE1 inhibitors with predicted activity (Ki) ≤ 50 nM have been identified. Detailed analysis reveals pharmacophoric features of these novel inhibitors required to inhibit BACE1.
Naik, P K; Singh, T; Singh, H
2009-07-01
Quantitative structure-activity relationship (QSAR) analyses were performed independently on data sets belonging to two groups of insecticides, namely the organophosphates and carbamates. Several types of descriptors including topological, spatial, thermodynamic, information content, lead likeness and E-state indices were used to derive quantitative relationships between insecticide activities and structural properties of chemicals. A systematic search approach based on missing value, zero value, simple correlation and multi-collinearity tests as well as the use of a genetic algorithm allowed the optimal selection of the descriptors used to generate the models. The QSAR models developed for both organophosphate and carbamate groups revealed good predictability with r(2) values of 0.949 and 0.838 as well as [image omitted] values of 0.890 and 0.765, respectively. In addition, a linear correlation was observed between the predicted and experimental LD(50) values for the test set data with r(2) of 0.871 and 0.788 for both the organophosphate and carbamate groups, indicating that the prediction accuracy of the QSAR models was acceptable. The models were also tested successfully from external validation criteria. QSAR models developed in this study should help further design of novel potent insecticides.
How evolutionary crystal structure prediction works--and why.
Oganov, Artem R; Lyakhov, Andriy O; Valle, Mario
2011-03-15
Once the crystal structure of a chemical substance is known, many properties can be predicted reliably and routinely. Therefore if researchers could predict the crystal structure of a material before it is synthesized, they could significantly accelerate the discovery of new materials. In addition, the ability to predict crystal structures at arbitrary conditions of pressure and temperature is invaluable for the study of matter at extreme conditions, where experiments are difficult. Crystal structure prediction (CSP), the problem of finding the most stable arrangement of atoms given only the chemical composition, has long remained a major unsolved scientific problem. Two problems are entangled here: search, the efficient exploration of the multidimensional energy landscape, and ranking, the correct calculation of relative energies. For organic crystals, which contain a few molecules in the unit cell, search can be quite simple as long as a researcher does not need to include many possible isomers or conformations of the molecules; therefore ranking becomes the main challenge. For inorganic crystals, quantum mechanical methods often provide correct relative energies, making search the most critical problem. Recent developments provide useful practical methods for solving the search problem to a considerable extent. One can use simulated annealing, metadynamics, random sampling, basin hopping, minima hopping, and data mining. Genetic algorithms have been applied to crystals since 1995, but with limited success, which necessitated the development of a very different evolutionary algorithm. This Account reviews CSP using one of the major techniques, the hybrid evolutionary algorithm USPEX (Universal Structure Predictor: Evolutionary Xtallography). Using recent developments in the theory of energy landscapes, we unravel the reasons evolutionary techniques work for CSP and point out their limitations. We demonstrate that the energy landscapes of chemical systems have an overall shape and explore their intrinsic dimensionalities. Because of the inverse relationships between order and energy and between the dimensionality and diversity of an ensemble of crystal structures, the chances that a random search will find the ground state decrease exponentially with increasing system size. A well-designed evolutionary algorithm allows for much greater computational efficiency. We illustrate the power of evolutionary CSP through applications that examine matter at high pressure, where new, unexpected phenomena take place. Evolutionary CSP has allowed researchers to make unexpected discoveries such as a transparent phase of sodium, a partially ionic form of boron, complex superconducting forms of calcium, a novel superhard allotrope of carbon, polymeric modifications of nitrogen, and a new class of compounds, perhydrides. These methods have also led to the discovery of novel hydride superconductors including the "impossible" LiH(n) (n=2, 6, 8) compounds, and CaLi(2). We discuss extensions of the method to molecular crystals, systems of variable composition, and the targeted optimization of specific physical properties. © 2011 American Chemical Society
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klintenberg, M.; Haraldsen, Jason T.; Balatsky, Alexander V.
In this paper, we report a data-mining investigation for the search of topological insulators by examining individual electronic structures for over 60,000 materials. Using a data-mining algorithm, we survey changes in band inversion with and without spin-orbit coupling by screening the calculated electronic band structure for a small gap and a change concavity at high-symmetry points. Overall, we were able to identify a number of topological candidates with varying structures and composition. Lastly, our overall goal is expand the realm of predictive theory into the determination of new and exotic complex materials through the data mining of electronic structure.
Klintenberg, M.; Haraldsen, Jason T.; Balatsky, Alexander V.
2014-06-19
In this paper, we report a data-mining investigation for the search of topological insulators by examining individual electronic structures for over 60,000 materials. Using a data-mining algorithm, we survey changes in band inversion with and without spin-orbit coupling by screening the calculated electronic band structure for a small gap and a change concavity at high-symmetry points. Overall, we were able to identify a number of topological candidates with varying structures and composition. Lastly, our overall goal is expand the realm of predictive theory into the determination of new and exotic complex materials through the data mining of electronic structure.
Optimizing physical energy functions for protein folding.
Fujitsuka, Yoshimi; Takada, Shoji; Luthey-Schulten, Zaida A; Wolynes, Peter G
2004-01-01
We optimize a physical energy function for proteins with the use of the available structural database and perform three benchmark tests of the performance: (1) recognition of native structures in the background of predefined decoy sets of Levitt, (2) de novo structure prediction using fragment assembly sampling, and (3) molecular dynamics simulations. The energy parameter optimization is based on the energy landscape theory and uses a Monte Carlo search to find a set of parameters that seeks the largest ratio deltaE(s)/DeltaE for all proteins in a training set simultaneously. Here, deltaE(s) is the stability gap between the native and the average in the denatured states and DeltaE is the energy fluctuation among these states. Some of the energy parameters optimized are found to show significant correlation with experimentally observed quantities: (1) In the recognition test, the optimized function assigns the lowest energy to either the native or a near-native structure among many decoy structures for all the proteins studied. (2) Structure prediction with the fragment assembly sampling gives structure models with root mean square deviation less than 6 A in one of the top five cluster centers for five of six proteins studied. (3) Structure prediction using molecular dynamics simulation gives poorer performance, implying the importance of having a more precise description of local structures. The physical energy function solely inferred from a structural database neither utilizes sequence information from the family of the target nor the outcome of the secondary structure prediction but can produce the correct native fold for many small proteins. Copyright 2003 Wiley-Liss, Inc.
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H
2009-01-01
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
NASA Astrophysics Data System (ADS)
Yusof, Nik Yusnoraini; Bakar, Farah Diba Abu; Mahadi, Nor Muhammad; Raih, Mohd Firdaus; Murad, Abdul Munir Abdul
2015-09-01
A cDNA encoding Fe(II) 2-oxoglutarate (2OG) dependent dioxygenases was isolated from psychrophilic yeast, Glaciozyma antarctica PI12. We have successfully amplified 1,029 bp cDNA sequence that encodes 342 amino acid with predicted molecular weight 38 kDa. The prediction protein was analysed using various bioinformatics tools to explore the properties of the protein. Based on a BLAST search analysis, the Fe2OX amino acid sequence showed 61% identity to the sequence of oxoglutarate/iron-dependent oxygenase from Rhodosporidium toruloides NP11. SignalP prediction showed that the Fe2OX protein contains no putative signal peptide, which suggests that this enzyme most probably localised intracellularly.The structure of Fe2OX was predicted by homology modelling using MODELLER9v11. The model with the lowest objective function was selected from hundred models generated using MODELLER9v11. Analysis of the structure revealed the longer loop at Fe2OX from G.antarctica that might be responsible for the flexibility of the structure, which contributes to its adaptation to low temperatures. Fe2OX hold a highly conserved Fe(II) binding HXD/E…H triad motif. The binding site for 2-oxoglutarate was found conserved for Arg280 among reported studies, however the Phe268 was found to be different in Fe2OX.
Zhang, Tong-Liang; Ding, Yong-Sheng; Chou, Kuo-Chen
2008-01-07
Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.
A PRIM approach to predictive-signature development for patient stratification
Chen, Gong; Zhong, Hua; Belousov, Anton; Devanarayan, Viswanath
2015-01-01
Patients often respond differently to a treatment because of individual heterogeneity. Failures of clinical trials can be substantially reduced if, prior to an investigational treatment, patients are stratified into responders and nonresponders based on biological or demographic characteristics. These characteristics are captured by a predictive signature. In this paper, we propose a procedure to search for predictive signatures based on the approach of patient rule induction method. Specifically, we discuss selection of a proper objective function for the search, present its algorithm, and describe a resampling scheme that can enhance search performance. Through simulations, we characterize conditions under which the procedure works well. To demonstrate practical uses of the procedure, we apply it to two real-world data sets. We also compare the results with those obtained from a recent regression-based approach, Adaptive Index Models, and discuss their respective advantages. In this study, we focus on oncology applications with survival responses. PMID:25345685
Exploration of tetrahedral structures in silicate cathodes using a motif-network scheme
Zhao, Xin; Wu, Shunqing; Lv, Xiaobao; ...
2015-10-26
Using a motif-network search scheme, we studied the tetrahedral structures of the dilithium/disodium transition metal orthosilicates A 2MSiO 4 with A = Li or Na and M = Mn, Fe or Co. In addition to finding all previously reported structures, we discovered many other different tetrahedral-network-based crystal structures which are highly degenerate in energy. In addition, these structures can be classified into structures with 1D, 2D and 3D M-Si-O frameworks. A clear trend of the structural preference in different systems was revealed and possible indicators that affect the structure stabilities were introduced. For the case of Na systems which havemore » been much less investigated in the literature relative to the Li systems, we predicted their ground state structures and found evidence for the existence of new structural motifs.« less
LigandRNA: computational predictor of RNA–ligand interactions
Philips, Anna; Milanowska, Kaja; Łach, Grzegorz; Bujnicki, Janusz M.
2013-01-01
RNA molecules have recently become attractive as potential drug targets due to the increased awareness of their importance in key biological processes. The increase of the number of experimentally determined RNA 3D structures enabled structure-based searches for small molecules that can specifically bind to defined sites in RNA molecules, thereby blocking or otherwise modulating their function. However, as of yet, computational methods for structure-based docking of small molecule ligands to RNA molecules are not as well established as analogous methods for protein-ligand docking. This motivated us to create LigandRNA, a scoring function for the prediction of RNA–small molecule interactions. Our method employs a grid-based algorithm and a knowledge-based potential derived from ligand-binding sites in the experimentally solved RNA–ligand complexes. As an input, LigandRNA takes an RNA receptor file and a file with ligand poses. As an output, it returns a ranking of the poses according to their score. The predictive power of LigandRNA favorably compares to five other publicly available methods. We found that the combination of LigandRNA and Dock6 into a “meta-predictor” leads to further improvement in the identification of near-native ligand poses. The LigandRNA program is available free of charge as a web server at http://ligandrna.genesilico.pl. PMID:24145824
Discovery and study of novel protein tyrosine phosphatase 1B inhibitors
NASA Astrophysics Data System (ADS)
Zhang, Qian; Chen, Xi; Feng, Changgen
2017-10-01
Protein tyrosine phosphatase 1B (PTP1B) is considered to be a target for therapy of type II diabetes and obesity. So it is of great significance to take advantage of a computer aided drug design protocol involving the structured-based virtual screening with docking simulations for fast searching small molecule PTP1B inhibitors. Based on optimized complex structure of PTP1B bound with specific inhibitor of IX1, structured-based virtual screening against a library of natural products containing 35308 molecules, which was constructed based on Traditional Chinese Medicine database@ Taiwan (TCM database@ Taiwan), was conducted to determine the occurrence of PTP1B inhibitors using the Lubbock module and CDOCKER module from Discovery Studio 3.1 software package. The results were further filtered by predictive ADME simulation and predictive toxic simulation. As a result, 2 good drug-like molecules, namely para-benzoquinone compound 1 and Clavepictine analogue 2 were identified ultimately with the dock score of original inhibitor (IX1) and the receptor as a threshold. Binding model analyses revealed that these two candidate compounds have good interactions with PTP1B. The PTP1B inhibitory activity of compound 2 hasn't been reported before. The optimized compound 2 has higher scores and deserves further study.
Huang, Liqiang
2015-05-01
Basic visual features (e.g., color, orientation) are assumed to be processed in the same general way across different visual tasks. Here, a significant deviation from this assumption was predicted on the basis of the analysis of stimulus spatial structure, as characterized by the Boolean-map notion. If a task requires memorizing the orientations of a set of bars, then the map consisting of those bars can be readily used to hold the overall structure in memory and will thus be especially useful. If the task requires visual search for a target, then the map, which contains only an overall structure, will be of little use. Supporting these predictions, the present study demonstrated that in comparison to stimulus colors, bar orientations were processed more efficiently in change-detection tasks but less efficiently in visual search tasks (Cohen's d = 4.24). In addition to offering support for the role of the Boolean map in conscious access, the present work also throws doubts on the generality of processing visual features. © The Author(s) 2015.
Ab initio solution of macromolecular crystal structures without direct methods.
McCoy, Airlie J; Oeffner, Robert D; Wrobel, Antoni G; Ojala, Juha R M; Tryggvason, Karl; Lohkamp, Bernhard; Read, Randy J
2017-04-04
The majority of macromolecular crystal structures are determined using the method of molecular replacement, in which known related structures are rotated and translated to provide an initial atomic model for the new structure. A theoretical understanding of the signal-to-noise ratio in likelihood-based molecular replacement searches has been developed to account for the influence of model quality and completeness, as well as the resolution of the diffraction data. Here we show that, contrary to current belief, molecular replacement need not be restricted to the use of models comprising a substantial fraction of the unknown structure. Instead, likelihood-based methods allow a continuum of applications depending predictably on the quality of the model and the resolution of the data. Unexpectedly, our understanding of the signal-to-noise ratio in molecular replacement leads to the finding that, with data to sufficiently high resolution, fragments as small as single atoms of elements usually found in proteins can yield ab initio solutions of macromolecular structures, including some that elude traditional direct methods.
Optimization of protein-protein docking for predicting Fc-protein interactions.
Agostino, Mark; Mancera, Ricardo L; Ramsland, Paul A; Fernández-Recio, Juan
2016-11-01
The antibody crystallizable fragment (Fc) is recognized by effector proteins as part of the immune system. Pathogens produce proteins that bind Fc in order to subvert or evade the immune response. The structural characterization of the determinants of Fc-protein association is essential to improve our understanding of the immune system at the molecular level and to develop new therapeutic agents. Furthermore, Fc-binding peptides and proteins are frequently used to purify therapeutic antibodies. Although several structures of Fc-protein complexes are available, numerous others have not yet been determined. Protein-protein docking could be used to investigate Fc-protein complexes; however, improved approaches are necessary to efficiently model such cases. In this study, a docking-based structural bioinformatics approach is developed for predicting the structures of Fc-protein complexes. Based on the available set of X-ray structures of Fc-protein complexes, three regions of the Fc, loosely corresponding to three turns within the structure, were defined as containing the essential features for protein recognition and used as restraints to filter the initial docking search. Rescoring the filtered poses with an optimal scoring strategy provided a success rate of approximately 80% of the test cases examined within the top ranked 20 poses, compared to approximately 20% by the initial unrestrained docking. The developed docking protocol provides a significant improvement over the initial unrestrained docking and will be valuable for predicting the structures of currently undetermined Fc-protein complexes, as well as in the design of peptides and proteins that target Fc. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Shnoro, R. S.; Eicken, H.; Francis, J. A.; Scambos, T. A.; Schuur, E. A.; Straneo, F.; Wiggins, H. V.
2013-12-01
SEARCH is an interdisciplinary, interagency program that works with academic and government agency scientists and stakeholders to plan, conduct, and synthesize studies of Arctic change. Over the past three years, SEARCH has developed a new vision and mission, a set of prioritized cross-disciplinary 5-year goals, an integrated set of activities, and an organizational structure. The vision of SEARCH is to provide scientific understanding of arctic environmental change to help society understand and respond to a rapidly changing Arctic. SEARCH's 5-year science goals include: 1. Improve understanding, advance prediction, and explore consequences of changing Arctic sea ice. 2. Document and understand how degradation of near-surface permafrost will affect Arctic and global systems. 3. Improve predictions of future land-ice loss and impacts on sea level. 4. Analyze societal and policy implications of Arctic environmental change. Action Teams organized around each of the 5-year goals will serve as standing groups responsible for implementing specific goal activities. Members will be drawn from academia, different agencies and stakeholders, with a range of disciplinary backgrounds and perspectives. 'Arctic Futures 2050' scenarios tasks will describe plausible future states of the arctic system based on recent trajectories and projected changes. These scenarios will combine a range of data including climate model output, paleo-data, results from data synthesis and systems modeling, as well as expert scientific and traditional knowledge. Current activities include: - Arctic Observing Network (AON) - coordinating a system of atmospheric, land- and ocean-based environmental monitoring capabilities that will significantly advance our observations of arctic environmental conditions. - Arctic Sea Ice Outlook - an international effort that provides monthly summer reports synthesizing community estimates of the expected sea ice minimum. A newly-launched Sea Ice Prediction Network will create a network of scientists and stakeholders to generate, assess and communicate Arctic seasonal sea ice forecasts. - Collaboration with the Interagency Arctic Research Policy Committee (IARPC) to implement mutual science goals. SEARCH is sponsored by 8 U.S. agencies, including: the National Science Foundation, the National Oceanic and Atmospheric Administration, the National Aeronautics and Space Administration, the Department of Defense, the Department of Energy, the Department of the Interior, the Smithsonian Institution, and the U.S. Department of Agriculture. The U.S. Arctic Research Commission participates as an observer. For more information: http://www.arcus.org/search.
Hardwick, J Christopher R; MacKenzie, Fiona M
2003-01-10
To identify websites providing information about early pregnancy loss and compare this information with published guidelines from the Royal College of Obstetricians and Gynaecologists (RCOG). The value of 'Silberg' and 'Health on the net (HON)' website scoring systems in predicting the information provided via websites identified was assessed. A cross-sectional survey. Nineteen websites identified via two search engines (http://www.lycos.co.uk and http://www.msn.co.uk). Websites were searched for specific information in a structured manner and then scored by two independent observers against the website scoring systems and against a scoring system derived from guidelines published by the RCOG. Website scores against the scoring systems and against RCOG guidelines. Information concerning miscarriage contained within these websites was poor and scored accordingly against the RCOG guidelines (median score, 4.5/8). The website scoring systems did not predict the RCOG scores for a website (HON score R(S)=0.193 (95% confidence interval from -0.286 to 0.595), Silberg score, R(S)=0.035 (95% confidence interval from -0.426 to 0.482)). Few relevant websites were identified despite searching a large number via two search engines. The websites found did not answer our specific questions and consequently scored poorly against the RCOG guidelines. RCOG scores did not correlate with either scoring system. Web-based information for women attending with early pregnancy complications needs to be easily accessed and comprehensive. Written information given to women when seen with early pregnancy complications should include details of available comprehensive websites. Professional organisations, colleges or Government agencies should provide this type of information.
Virtual Interactomics of Proteins from Biochemical Standpoint
Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel
2012-01-01
Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations. PMID:22928109
Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods
NASA Astrophysics Data System (ADS)
Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.
2017-12-01
Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.
Genetic Epidemiology of Glucose-6-Dehydrogenase Deficiency in the Arab World.
Doss, C George Priya; Alasmar, Dima R; Bux, Reem I; Sneha, P; Bakhsh, Fadheela Dad; Al-Azwani, Iman; Bekay, Rajaa El; Zayed, Hatem
2016-11-17
A systematic search was implemented using four literature databases (PubMed, Embase, Science Direct and Web of Science) to capture all the causative mutations of Glucose-6-phosphate dehydrogenase (G6PD) deficiency (G6PDD) in the 22 Arab countries. Our search yielded 43 studies that captured 33 mutations (23 missense, one silent, two deletions, and seven intronic mutations), in 3,430 Arab patients with G6PDD. The 23 missense mutations were then subjected to phenotypic classification using in silico prediction tools, which were compared to the WHO pathogenicity scale as a reference. These in silico tools were tested for their predicting efficiency using rigorous statistical analyses. Of the 23 missense mutations, p.S188F, p.I48T, p.N126D, and p.V68M, were identified as the most common mutations among Arab populations, but were not unique to the Arab world, interestingly, our search strategy found four other mutations (p.N135T, p.S179N, p.R246L, and p.Q307P) that are unique to Arabs. These mutations were exposed to structural analysis and molecular dynamics simulation analysis (MDSA), which predicting these mutant forms as potentially affect the enzyme function. The combination of the MDSA, structural analysis, and in silico predictions and statistical tools we used will provide a platform for future prediction accuracy for the pathogenicity of genetic mutations.
Concomitant prediction of function and fold at the domain level with GO-based profiles.
Lopez, Daniel; Pazos, Florencio
2013-01-01
Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.
Davis, Eric; Devlin, Sean; Cooper, Candice; Nhaissi, Melissa; Paulson, Jennifer; Wells, Deborah; Scaradavou, Andromachi; Giralt, Sergio; Papadopoulos, Esperanza; Kernan, Nancy A; Byam, Courtney; Barker, Juliet N
2018-05-01
A strategy to rapidly determine if a matched unrelated donor (URD) can be secured for allograft recipients is needed. We sought to validate the accuracy of (1) HapLogic match predictions and (2) a resultant novel Search Prognosis (SP) patient categorization that could predict 8/8 HLA-matched URD(s) likelihood at search initiation. Patient prognosis categories at search initiation were correlated with URD confirmatory typing results. HapLogic-based SP categorizations accurately predicted the likelihood of an 8/8 HLA-match in 830 patients (1530 donors tested). Sixty percent of patients had 8/8 URD(s) identified. Patient SP categories (217 very good, 104 good, 178 fair, 33 poor, 153 very poor, 145 futile) were associated with a marked progressive decrease in 8/8 URD identification and transplantation. Very good to good categories were highly predictive of identifying and receiving an 8/8 URD regardless of ancestry. Europeans in fair/poor categories were more likely to identify and receive an 8/8 URD compared with non-Europeans. In all ancestries very poor and futile categories predicted no 8/8 URDs. HapLogic permits URD search results to be predicted once patient HLA typing and ancestry is obtained, dramatically improving search efficiency. Poor, very poor, andfutile searches can be immediately recognized, thereby facilitating prompt pursuit of alternative donors. Copyright © 2017 The American Society for Blood and Marrow Transplantation. Published by Elsevier Inc. All rights reserved.
Omelchenko, Marina V; Galperin, Michael Y; Wolf, Yuri I; Koonin, Eugene V
2010-04-30
Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
From QSAR to QSIIR: Searching for Enhanced Computational Toxicology Models
Zhu, Hao
2017-01-01
Quantitative Structure Activity Relationship (QSAR) is the most frequently used modeling approach to explore the dependency of biological, toxicological, or other types of activities/properties of chemicals on their molecular features. In the past two decades, QSAR modeling has been used extensively in drug discovery process. However, the predictive models resulted from QSAR studies have limited use for chemical risk assessment, especially for animal and human toxicity evaluations, due to the low predictivity of new compounds. To develop enhanced toxicity models with independently validated external prediction power, novel modeling protocols were pursued by computational toxicologists based on rapidly increasing toxicity testing data in recent years. This chapter reviews the recent effort in our laboratory to incorporate the biological testing results as descriptors in the toxicity modeling process. This effort extended the concept of QSAR to Quantitative Structure In vitro-In vivo Relationship (QSIIR). The QSIIR study examples provided in this chapter indicate that the QSIIR models that based on the hybrid (biological and chemical) descriptors are indeed superior to the conventional QSAR models that only based on chemical descriptors for several animal toxicity endpoints. We believe that the applications introduced in this review will be of interest and value to researchers working in the field of computational drug discovery and environmental chemical risk assessment. PMID:23086837
Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V
2012-01-01
Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care. PMID:22693047
Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V
2012-06-12
Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care.
Protein structure prediction with local adjust tabu search algorithm
2014-01-01
Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
Grain boundary phases in bcc metals
Frolov, T.; Setyawan, W.; Kurtz, R. J.; ...
2018-01-01
Evolutionary grand-canonical search predicts novel grain boundary structures and multiple grain boundary phases in elemental body-centered cubic (bcc) metals represented by tungsten, tantalum and molybdenum.
Modelling eye movements in a categorical search task
Zelinsky, Gregory J.; Adeli, Hossein; Peng, Yifan; Samaras, Dimitris
2013-01-01
We introduce a model of eye movements during categorical search, the task of finding and recognizing categorically defined targets. It extends a previous model of eye movements during search (target acquisition model, TAM) by using distances from an support vector machine classification boundary to create probability maps indicating pixel-by-pixel evidence for the target category in search images. Other additions include functionality enabling target-absent searches, and a fixation-based blurring of the search images now based on a mapping between visual and collicular space. We tested this model on images from a previously conducted variable set-size (6/13/20) present/absent search experiment where participants searched for categorically defined teddy bear targets among random category distractors. The model not only captured target-present/absent set-size effects, but also accurately predicted for all conditions the numbers of fixations made prior to search judgements. It also predicted the percentages of first eye movements during search landing on targets, a conservative measure of search guidance. Effects of set size on false negative and false positive errors were also captured, but error rates in general were overestimated. We conclude that visual features discriminating a target category from non-targets can be learned and used to guide eye movements during categorical search. PMID:24018720
Janssen, Stefan; Schudoma, Christian; Steger, Gerhard; Giegerich, Robert
2011-11-03
Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.
Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*
Abrusán, György; Zhang, Yang; Szilágyi, András
2013-01-01
Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042
PySeqLab: an open source Python package for sequence labeling and segmentation.
Allam, Ahmed; Krauthammer, Michael
2017-11-01
Text and genomic data are composed of sequential tokens, such as words and nucleotides that give rise to higher order syntactic constructs. In this work, we aim at providing a comprehensive Python library implementing conditional random fields (CRFs), a class of probabilistic graphical models, for robust prediction of these constructs from sequential data. Python Sequence Labeling (PySeqLab) is an open source package for performing supervised learning in structured prediction tasks. It implements CRFs models, that is discriminative models from (i) first-order to higher-order linear-chain CRFs, and from (ii) first-order to higher-order semi-Markov CRFs (semi-CRFs). Moreover, it provides multiple learning algorithms for estimating model parameters such as (i) stochastic gradient descent (SGD) and its multiple variations, (ii) structured perceptron with multiple averaging schemes supporting exact and inexact search using 'violation-fixing' framework, (iii) search-based probabilistic online learning algorithm (SAPO) and (iv) an interface for Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited-memory BFGS algorithms. Viterbi and Viterbi A* are used for inference and decoding of sequences. Using PySeqLab, we built models (classifiers) and evaluated their performance in three different domains: (i) biomedical Natural language processing (NLP), (ii) predictive DNA sequence analysis and (iii) Human activity recognition (HAR). State-of-the-art performance comparable to machine-learning based systems was achieved in the three domains without feature engineering or the use of knowledge sources. PySeqLab is available through https://bitbucket.org/A_2/pyseqlab with tutorials and documentation. ahmed.allam@yale.edu or michael.krauthammer@yale.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Genetic Algorithm Based Framework for Automation of Stochastic Modeling of Multi-Season Streamflows
NASA Astrophysics Data System (ADS)
Srivastav, R. K.; Srinivasan, K.; Sudheer, K.
2009-05-01
Synthetic streamflow data generation involves the synthesis of likely streamflow patterns that are statistically indistinguishable from the observed streamflow data. The various kinds of stochastic models adopted for multi-season streamflow generation in hydrology are: i) parametric models which hypothesize the form of the periodic dependence structure and the distributional form a priori (examples are PAR, PARMA); disaggregation models that aim to preserve the correlation structure at the periodic level and the aggregated annual level; ii) Nonparametric models (examples are bootstrap/kernel based methods), which characterize the laws of chance, describing the stream flow process, without recourse to prior assumptions as to the form or structure of these laws; (k-nearest neighbor (k-NN), matched block bootstrap (MABB)); non-parametric disaggregation model. iii) Hybrid models which blend both parametric and non-parametric models advantageously to model the streamflows effectively. Despite many of these developments that have taken place in the field of stochastic modeling of streamflows over the last four decades, accurate prediction of the storage and the critical drought characteristics has been posing a persistent challenge to the stochastic modeler. This is partly because, usually, the stochastic streamflow model parameters are estimated by minimizing a statistically based objective function (such as maximum likelihood (MLE) or least squares (LS) estimation) and subsequently the efficacy of the models is being validated based on the accuracy of prediction of the estimates of the water-use characteristics, which requires large number of trial simulations and inspection of many plots and tables. Still accurate prediction of the storage and the critical drought characteristics may not be ensured. In this study a multi-objective optimization framework is proposed to find the optimal hybrid model (blend of a simple parametric model, PAR(1) model and matched block bootstrap (MABB) ) based on the explicit objective functions of minimizing the relative bias and relative root mean square error in estimating the storage capacity of the reservoir. The optimal parameter set of the hybrid model is obtained based on the search over a multi- dimensional parameter space (involving simultaneous exploration of the parametric (PAR(1)) as well as the non-parametric (MABB) components). This is achieved using the efficient evolutionary search based optimization tool namely, non-dominated sorting genetic algorithm - II (NSGA-II). This approach helps in reducing the drudgery involved in the process of manual selection of the hybrid model, in addition to predicting the basic summary statistics dependence structure, marginal distribution and water-use characteristics accurately. The proposed optimization framework is used to model the multi-season streamflows of River Beaver and River Weber of USA. In case of both the rivers, the proposed GA-based hybrid model yields a much better prediction of the storage capacity (where simultaneous exploration of both parametric and non-parametric components is done) when compared with the MLE-based hybrid models (where the hybrid model selection is done in two stages, thus probably resulting in a sub-optimal model). This framework can be further extended to include different linear/non-linear hybrid stochastic models at other temporal and spatial scales as well.
Galehdari, Hamid; Saki, Najmaldin; Mohammadi-Asl, Javad; Rahim, Fakher
2013-01-01
Crigler-Najjar syndrome (CNS) type I and type II are usually inherited as autosomal recessive conditions that result from mutations in the UGT1A1 gene. The main objective of the present review is to summarize results of all available evidence on the accuracy of SNP-based pathogenicity detection tools compared to published clinical result for the prediction of in nsSNPs that leads to disease using prediction performance method. A comprehensive search was performed to find all mutations related to CNS. Database searches included dbSNP, SNPdbe, HGMD, Swissvar, ensemble, and OMIM. All the mutation related to CNS was extracted. The pathogenicity prediction was done using SNP-based pathogenicity detection tools include SIFT, PHD-SNP, PolyPhen2, fathmm, Provean, and Mutpred. Overall, 59 different SNPs related to missense mutations in the UGT1A1 gene, were reviewed. Comparing the diagnostic OR, PolyPhen2 and Mutpred have the highest detection 4.983 (95% CI: 1.24 - 20.02) in both, following by SIFT (diagnostic OR: 3.25, 95% CI: 1.07 - 9.83). The highest MCC of SNP-based pathogenicity detection tools, was belong to SIFT (34.19%) followed by Provean, PolyPhen2, and Mutpred (29.99%, 29.89%, and 29.89%, respectively). Hence the highest SNP-based pathogenicity detection tools ACC, was fit to SIFT (62.71%) followed by PolyPhen2, and Mutpred (61.02%, in both). Our results suggest that some of the well-established SNP-based pathogenicity detection tools can appropriately reflect the role of a disease-associated SNP in both local and global structures.
Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao
2017-10-06
Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, p<0.001) or temperature ( r =0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of -345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao
2017-01-01
Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, p<0.001) or temperature (r=0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. PMID:28988169
Liu, Xian; Xu, Yuan; Li, Shanshan; Wang, Yulan; Peng, Jianlong; Luo, Cheng; Luo, Xiaomin; Zheng, Mingyue; Chen, Kaixian; Jiang, Hualiang
2014-01-01
Ligand-based in silico target fishing can be used to identify the potential interacting target of bioactive ligands, which is useful for understanding the polypharmacology and safety profile of existing drugs. The underlying principle of the approach is that known bioactive ligands can be used as reference to predict the targets for a new compound. We tested a pipeline enabling large-scale target fishing and drug repositioning, based on simple fingerprint similarity rankings with data fusion. A large library containing 533 drug relevant targets with 179,807 active ligands was compiled, where each target was defined by its ligand set. For a given query molecule, its target profile is generated by similarity searching against the ligand sets assigned to each target, for which individual searches utilizing multiple reference structures are then fused into a single ranking list representing the potential target interaction profile of the query compound. The proposed approach was validated by 10-fold cross validation and two external tests using data from DrugBank and Therapeutic Target Database (TTD). The use of the approach was further demonstrated with some examples concerning the drug repositioning and drug side-effects prediction. The promising results suggest that the proposed method is useful for not only finding promiscuous drugs for their new usages, but also predicting some important toxic liabilities. With the rapid increasing volume and diversity of data concerning drug related targets and their ligands, the simple ligand-based target fishing approach would play an important role in assisting future drug design and discovery.
Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho
2017-01-04
We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Advances in Toxico-Cheminformatics: Supporting a New ...
EPA’s National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction through the harnessing of legacy toxicity data, creation of data linkages, and generation of new high-throughput screening (HTS) data. The DSSTox project is working to improve public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate both data-mining and read-across. Both DSSTox Structure-Files and the dedicated on-line DSSTox Structure-Browser are enabling seamless structure-based searching and linkages to and from previously isolated, chemically indexed public toxicity data resources (e.g., NTP, EPA IRIS, CPDB). Most recently, structure-enabled search capabilities have been extended to chemical exposure-related microarray experiments in the public EBI Array Express database, additionally linking this resource to the NIEHS CEBS toxicogenomics database. The public DSSTox chemical and bioassay inventory has been recently integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The DSSTox project is providing cheminformatics support for EPA’s ToxCastTM project, as well as supporting collaborations with the National Toxicology Program (NTP) HTS and the NIH Chemical Genomics Center (NCGC). Phase I of the ToxCastTM project is generating HT
Theoretical prediction of a novel inorganic fullerene-like family of silicon-carbon materials
NASA Astrophysics Data System (ADS)
Wang, Ruoxi; Zhang, Dongju; Liu, Chengbu
2005-08-01
In an effort to search for new inorganic fullerene-like structures, we designed a series of novel silicon-carbon cages, (SiC) n ( n = 6-36), based on the uniformly hybrid Si-C four- and six-membered-rings, and researched their geometrical and electronic structures, as well as their relative stabilities using the density function theory. Among these cages, the structures for n = 12, 16, and 36 were found to been energetically more favorable. The calculated disproportionation energy and binding energy per SiC unit show that the (SiC) 12 cage is the most stable one among these designed structures. The present calculations not only indicate that silicon-carbon fullerenes are promised to be synthesized in future, but also provide a new way for stabilizing silicon cages by uniformly doping carbon atoms into silicon structures.
Evolution, Energy Landscapes and the Paradoxes of Protein Folding
Wolynes, Peter G.
2014-01-01
Protein folding has been viewed as a difficult problem of molecular self-organization. The search problem involved in folding however has been simplified through the evolution of folding energy landscapes that are funneled. The funnel hypothesis can be quantified using energy landscape theory based on the minimal frustration principle. Strong quantitative predictions that follow from energy landscape theory have been widely confirmed both through laboratory folding experiments and from detailed simulations. Energy landscape ideas also have allowed successful protein structure prediction algorithms to be developed. The selection constraint of having funneled folding landscapes has left its imprint on the sequences of existing protein structural families. Quantitative analysis of co-evolution patterns allows us to infer the statistical characteristics of the folding landscape. These turn out to be consistent with what has been obtained from laboratory physicochemical folding experiments signalling a beautiful confluence of genomics and chemical physics. PMID:25530262
Crystal Structure and Superconductivity of PH 3 at High Pressures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Hanyu; Li, Yinwei; Gao, Guoying
2016-02-04
We have performed a systematic structure search on solid PH3 at high pressures using the particle swarm optimization method. At 100–200 GPa, the search led to two structures which along with others have P–P bonds. These structures are structurally and chemically distinct from those predicted for the high-pressure superconducting H2S phase, which has a different topology (i.e., does not contain S–S bonds). Phonon and electron–phonon coupling calculations indicate that both structures are dynamically stable and superconducting. The pressure dependence and critical temperature for the monoclinic (C2/m) phase of 83 K at 200 GPa are in excellent agreement with a recentmore » experimental report.« less
ERIC Educational Resources Information Center
Abuhamdieh, Ayman H.; Harder, Joseph T.
2015-01-01
This paper proposes a meta-cognitive, systems-based, information structuring model (McSIS) to systematize online information search behavior based on literature review of information-seeking models. The General Systems Theory's (GST) prepositions serve as its framework. Factors influencing information-seekers, such as the individual learning…
2012-01-01
Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852
Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon
2012-01-01
To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.
GRID: a high-resolution protein structure refinement algorithm.
Chitsaz, Mohsen; Mayo, Stephen L
2013-03-05
The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. Copyright © 2012 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Jia, D.; Feng, Y.; Liu, J.; Yao, X.; Zhang, Z.; Ye, T.
2017-12-01
1. Working BackgroundCurrent Status of Geological Prospecting: Detecting boundaries and bottoms, making ore search nearby; Seeing the stars, not seeing the Moon; Deep prospecting, undesirable results. The reasons of these problems are the regional metallogenic backgroud unclear and the metallogenic backgroud of the exploration regions unknown. Accordingly, Development and Research Center, CGS organized a geological setting research, in detail investigate metallogenic geological features and acquire mineralization information. 2. Technical SchemeCore research content is prediction elements of Metallogenic Structure. Adopt unified technical requirements from top to bottom, and technical route from bottom to top; Divide elements of mineral forecast and characteristics of geological structure into five elements for research and expression; Make full use of geophysical, geochemical and remote sensing inferences for the interpretation of macro information. After eight years the great project was completed. 3. Main AchievementsInnovation of basic maps compilation content of geological background, reinforce of geological structure data base of potentiality valuation. Preparation of geotectonic facies maps in different scales and professions, providing brand-new geologic background for potentiality assessment, promoting Chinese geotectonic research to the new height. Preparation of 3,375 geological structure thematic base maps of detecting working area in 6 kinds of prediction methods, providing base working maps, rock assemblage, structure of the protolith of geologic body / mineralization / ore controlling for mineral prediction of 25 ores. Enrichment and development of geotectonic facies analysis method, establishment of metallogenic background research thoughts and approach system for assessment of national mineral resources potentiality for the first time. 4. Application EffectOrientation——More and better results with less effort. Positioning——Have a definite object in view. Heart calm down——Confidence.
Sirius PSB: a generic system for analysis of biological sequences.
Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon
2009-12-01
Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.
NASA Technical Reports Server (NTRS)
Eckstein, M. P.; Thomas, J. P.; Palmer, J.; Shimozaki, S. S.
2000-01-01
Recently, quantitative models based on signal detection theory have been successfully applied to the prediction of human accuracy in visual search for a target that differs from distractors along a single attribute (feature search). The present paper extends these models for visual search accuracy to multidimensional search displays in which the target differs from the distractors along more than one feature dimension (conjunction, disjunction, and triple conjunction displays). The model assumes that each element in the display elicits a noisy representation for each of the relevant feature dimensions. The observer combines the representations across feature dimensions to obtain a single decision variable, and the stimulus with the maximum value determines the response. The model accurately predicts human experimental data on visual search accuracy in conjunctions and disjunctions of contrast and orientation. The model accounts for performance degradation without resorting to a limited-capacity spatially localized and temporally serial mechanism by which to bind information across feature dimensions.
Guo, J L; Wang, T F; Liao, J Y; Huang, C M
2016-02-01
This study assessed the applicability and efficacy of the theory of planned behavior (TPB) in predicting breastfeeding. The TPB assumes a rational approach for engaging in various behaviors, and has been used extensively for explaining health behavior. However, most studies have tested the effectiveness of TPB constructs in predicting how people perform actions for their own benefit rather than performing behaviors that are beneficial to others, such as breastfeeding infants. A meta-analysis approach could help clarify the breastfeeding practice to promote breastfeeding. This study used meta-analytic procedures. We searched for studies to include in our analysis, examining those published between January 1, 1990 and December 31, 2013 in PubMed, Medline, CINAHL, ProQuest, and Mosby's Index. We also reviewed journals with a history of publishing breastfeeding studies and searched reference lists for potential articles to include. Ten studies comprising a total of 2694 participants were selected for analysis. These studies yielded 10 effect sizes from the TPB, which ranged from 0.20 to 0.59. Structural equation model analysis using the pooled correlation matrix enabled us to determine the relative coefficients among TPB constructs. Attitude, subjective norms, and perceived behavioral control were all significant predictors of breastfeeding intention, whereas intention was a strong predictor of breastfeeding behavior. Perceived behavioral control reached a borderline level of significance to breastfeeding behavior. Theoretical and empirical implications are discussed from the perspective of evidence-based practice. Copyright © 2015 Elsevier Inc. All rights reserved.
Structator: fast index-based search for RNA sequence-structure patterns
2011-01-01
Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640
Combinatorial Pharmacophore-Based 3D-QSAR Analysis and Virtual Screening of FGFR1 Inhibitors
Zhou, Nannan; Xu, Yuan; Liu, Xian; Wang, Yulan; Peng, Jianlong; Luo, Xiaomin; Zheng, Mingyue; Chen, Kaixian; Jiang, Hualiang
2015-01-01
The fibroblast growth factor/fibroblast growth factor receptor (FGF/FGFR) signaling pathway plays crucial roles in cell proliferation, angiogenesis, migration, and survival. Aberration in FGFRs correlates with several malignancies and disorders. FGFRs have proved to be attractive targets for therapeutic intervention in cancer, and it is of high interest to find FGFR inhibitors with novel scaffolds. In this study, a combinatorial three-dimensional quantitative structure-activity relationship (3D-QSAR) model was developed based on previously reported FGFR1 inhibitors with diverse structural skeletons. This model was evaluated for its prediction performance on a diverse test set containing 232 FGFR inhibitors, and it yielded a SD value of 0.75 pIC50 units from measured inhibition affinities and a Pearson’s correlation coefficient R2 of 0.53. This result suggests that the combinatorial 3D-QSAR model could be used to search for new FGFR1 hit structures and predict their potential activity. To further evaluate the performance of the model, a decoy set validation was used to measure the efficiency of the model by calculating EF (enrichment factor). Based on the combinatorial pharmacophore model, a virtual screening against SPECS database was performed. Nineteen novel active compounds were successfully identified, which provide new chemical starting points for further structural optimization of FGFR1 inhibitors. PMID:26110383
Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates
Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; ...
2013-03-07
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less
Systematic review of the concurrent and predictive validity of MRI biomarkers in OA
Hunter, D.J.; Zhang, W.; Conaghan, Philip G.; Hirko, K.; Menashe, L.; Li, L.; Reichmann, W.M.; Losina, E.
2012-01-01
SUMMARY Objective To summarize literature on the concurrent and predictive validity of MRI-based measures of osteoarthritis (OA) structural change. Methods An online literature search was conducted of the OVID, EMBASE, CINAHL, PsychInfo and Cochrane databases of articles published up to the time of the search, April 2009. 1338 abstracts obtained with this search were preliminarily screened for relevance by two reviewers. Of these, 243 were selected for data extraction for this analysis on validity as well as separate reviews on discriminate validity and diagnostic performance. Of these 142 manuscripts included data pertinent to concurrent validity and 61 manuscripts for the predictive validity review. For this analysis we extracted data on criterion (concurrent and predictive) validity from both longitudinal and cross-sectional studies for all synovial joint tissues as it relates to MRI measurement in OA. Results Concurrent validity of MRI in OA has been examined compared to symptoms, radiography, histology/pathology, arthroscopy, CT, and alignment. The relation of bone marrow lesions, synovitis and effusion to pain was moderate to strong. There was a weak or no relation of cartilage morphology or meniscal tears to pain. The relation of cartilage morphology to radiographic OA and radiographic joint space was inconsistent. There was a higher frequency of meniscal tears, synovitis and other features in persons with radiographic OA. The relation of cartilage to other constructs including histology and arthroscopy was stronger. Predictive validity of MRI in OA has been examined for ability to predict total knee replacement (TKR), change in symptoms, radiographic progression as well as MRI progression. Quantitative cartilage volume change and presence of cartilage defects or bone marrow lesions are potential predictors of TKR. Conclusion MRI has inherent strengths and unique advantages in its ability to visualize multiple individual tissue pathologies relating to pain and also predict clinical outcome. The complex disease of OA which involves an array of tissue abnormalities is best imaged using this imaging tool. PMID:21396463
A PRIM approach to predictive-signature development for patient stratification.
Chen, Gong; Zhong, Hua; Belousov, Anton; Devanarayan, Viswanath
2015-01-30
Patients often respond differently to a treatment because of individual heterogeneity. Failures of clinical trials can be substantially reduced if, prior to an investigational treatment, patients are stratified into responders and nonresponders based on biological or demographic characteristics. These characteristics are captured by a predictive signature. In this paper, we propose a procedure to search for predictive signatures based on the approach of patient rule induction method. Specifically, we discuss selection of a proper objective function for the search, present its algorithm, and describe a resampling scheme that can enhance search performance. Through simulations, we characterize conditions under which the procedure works well. To demonstrate practical uses of the procedure, we apply it to two real-world data sets. We also compare the results with those obtained from a recent regression-based approach, Adaptive Index Models, and discuss their respective advantages. In this study, we focus on oncology applications with survival responses. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
In silico work flow for scaffold hopping in Leishmania.
Waugh, Barnali; Ghosh, Ambarnil; Bhattacharyya, Dhananjay; Ghoshal, Nanda; Banerjee, Rahul
2014-11-17
Leishmaniasis,a broad spectrum of diseases caused by several sister species of protozoa belonging to family trypanosomatidae and genus leishmania , generally affects poorer sections of the populace in third world countries. With the emergence of strains resistant to traditional therapies and the high cost of second line drugs which generally have severe side effects, it becomes imperative to continue the search for alternative drugs to combat the disease. In this work, the leishmanial genomes and the human genome have been compared to identify proteins unique to the parasite and whose structures (or those of close homologues) are available in the Protein Data Bank. Subsequent to the prioritization of these proteins (based on their essentiality, virulence factor etc.), inhibitors have been identified for a subset of these prospective drug targets by means of an exhaustive literature survey. A set of three dimensional protein-ligand complexes have been assembled from the list of leishmanial drug targets by culling structures from the Protein Data Bank or by means of template based homology modeling followed by ligand docking with the GOLD software. Based on these complexes several structure based pharmacophores have been designed and used to search for alternative inhibitors in the ZINC database. This process led to a list of prospective compounds which could serve as potential antileishmanials. These small molecules were also used to search the Drug Bank to identify prospective lead compounds already in use as approved drugs. Interestingly, paromomycin which is currently being used as an antileishmanial drug spontaneously appeared in the list, probably giving added confidence to the 'scaffold hopping' computational procedures adopted in this work. The report thus provides the basis to experimentally verify several lead compounds for their predicted antileishmanial activity and includes several useful data bases of prospective drug targets in leishmania, their inhibitors and protein--inhibitor three dimensional complexes.
Prediction of RNA secondary structures: from theory to models and real molecules
NASA Astrophysics Data System (ADS)
Schuster, Peter
2006-05-01
RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.
Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.
Mei, Gang; Xu, Nengxiong; Xu, Liangliang
2016-01-01
This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.
Querying archetype-based EHRs by search ontology-based XPath engineering.
Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich
2018-05-11
Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
NASA Astrophysics Data System (ADS)
Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na
2016-10-01
Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.
Conjunctive search for one and two identical targets.
Ward, R; McClelland, J L
1989-11-01
The assumptions of feature integration theory as a blind, serial, self-terminating search (SSTS) mechanism are extended to displays containing 2 identical targets. The SSTS predicts no differences in negative-response displays, which require an exhaustive search of the display. Quantitative predictions are confirmed for the positive responses, but not for the negatives, suggesting that the SSTS model is incorrect. Two possible explanations for the results in the negative conditions, differential search rates and early quitting in the negatives, are rejected. It is suggested that using any self-terminating search mechanism will lead to difficulty in interpreting the results, including accounts for which the search is parallel over small groups of items. A resource-limited parallel model, which is based on the diffusion model of Ratcliff (1978), appears to fit the data well.
CYPSI: a structure-based interface for cytochrome P450s and ligands in Arabidopsis thaliana
2012-01-01
Background The cytochrome P450 (CYP) superfamily enables terrestrial plants to adapt to harsh environments. CYPs are key enzymes involved in a wide range of metabolic pathways. It is particularly useful to be able to analyse the three-dimensional (3D) structure when investigating the interactions between CYPs and their substrates. However, only two plant CYP structures have been resolved. In addition, no currently available databases contain structural information on plant CYPs and ligands. Fortunately, the 3D structure of CYPs is highly conserved and this has made it possible to obtain structural information from template-based modelling (TBM). Description The CYP Structure Interface (CYPSI) is a platform for CYP studies. CYPSI integrated the 3D structures for 266 A. thaliana CYPs predicted by three TBM methods: BMCD, which we developed specifically for CYP TBM; and two well-known web-servers, MUSTER and I-TASSER. After careful template selection and optimization, the models built by BMCD were accurate enough for practical application, which we demonstrated using a docking example aimed at searching for the CYPs responsible for ABA 8′-hydroxylation. CYPSI also provides extensive resources for A. thaliana CYP structure and function studies, including 400 PDB entries for solved CYPs, 48 metabolic pathways associated with A. thaliana CYPs, 232 reported CYP ligands and 18 A. thaliana CYPs docked with ligands (61 complexes in total). In addition, CYPSI also includes the ability to search for similar sequences and chemicals. Conclusions CYPSI provides comprehensive structure and function information for A. thaliana CYPs, which should facilitate investigations into the interactions between CYPs and their substrates. CYPSI has a user-friendly interface, which is available at http://bioinfo.cau.edu.cn/CYPSI. PMID:23256889
2012-01-01
PI3K, AKT, and mTOR are key kinases from PI3K signaling pathway being extensively pursued to treat a variety of cancers in oncology. To search for a structurally differentiated back-up candidate to PF-04691502, which is currently in phase I/II clinical trials for treating solid tumors, a lead optimization effort was carried out with a tricyclic imidazo[1,5]naphthyridine series. Integration of structure-based drug design and physical properties-based optimization yielded a potent and selective PI3K/mTOR dual kinase inhibitor PF-04979064. This manuscript discusses the lead optimization for the tricyclic series, which both improved the in vitro potency and addressed a number of ADMET issues including high metabolic clearance mediated by both P450 and aldehyde oxidase (AO), poor permeability, and poor solubility. An empirical scaling tool was developed to predict human clearance from in vitro human liver S9 assay data for tricyclic derivatives that were AO substrates. PMID:24900568
Towards novel organic high-Tc superconductors: Data mining using density of states similarity search
NASA Astrophysics Data System (ADS)
Geilhufe, R. Matthias; Borysov, Stanislav S.; Kalpakchi, Dmytro; Balatsky, Alexander V.
2018-02-01
Identifying novel functional materials with desired key properties is an important part of bridging the gap between fundamental research and technological advancement. In this context, high-throughput calculations combined with data-mining techniques highly accelerated this process in different areas of research during the past years. The strength of a data-driven approach for materials prediction lies in narrowing down the search space of thousands of materials to a subset of prospective candidates. Recently, the open-access organic materials database OMDB was released providing electronic structure data for thousands of previously synthesized three-dimensional organic crystals. Based on the OMDB, we report about the implementation of a novel density of states similarity search tool which is capable of retrieving materials with similar density of states to a reference material. The tool is based on the approximate nearest neighbor algorithm as implemented in the ANNOY library and can be applied via the OMDB web interface. The approach presented here is wide ranging and can be applied to various problems where the density of states is responsible for certain key properties of a material. As the first application, we report about materials exhibiting electronic structure similarities to the aromatic hydrocarbon p-terphenyl which was recently discussed as a potential organic high-temperature superconductor exhibiting a transition temperature in the order of 120 K under strong potassium doping. Although the mechanism driving the remarkable transition temperature remains under debate, we argue that the density of states, reflecting the electronic structure of a material, might serve as a crucial ingredient for the observed high Tc. To provide candidates which might exhibit comparable properties, we present 15 purely organic materials with similar features to p-terphenyl within the electronic structure, which also tend to have structural similarities with p-terphenyl such as space group symmetries, chemical composition, and molecular structure. The experimental verification of these candidates might lead to a better understanding of the underlying mechanism in case similar superconducting properties are revealed.
Semiempirical prediction of protein folds
NASA Astrophysics Data System (ADS)
Fernández, Ariel; Colubri, Andrés; Appignanesi, Gustavo
2001-08-01
We introduce a semiempirical approach to predict ab initio expeditious pathways and native backbone geometries of proteins that fold under in vitro renaturation conditions. The algorithm is engineered to incorporate a discrete codification of local steric hindrances that constrain the movements of the peptide backbone throughout the folding process. Thus, the torsional state of the chain is assumed to be conditioned by the fact that hopping from one basin of attraction to another in the Ramachandran map (local potential energy surface) of each residue is energetically more costly than the search for a specific (Φ, Ψ) torsional state within a single basin. A combinatorial procedure is introduced to evaluate coarsely defined torsional states of the chain defined ``modulo basins'' and translate them into meaningful patterns of long range interactions. Thus, an algorithm for structure prediction is designed based on the fact that local contributions to the potential energy may be subsumed into time-evolving conformational constraints defining sets of restricted backbone geometries whereupon the patterns of nonbonded interactions are constructed. The predictive power of the algorithm is assessed by (a) computing ab initio folding pathways for mammalian ubiquitin that ultimately yield a stable structural pattern reproducing all of its native features, (b) determining the nucleating event that triggers the hydrophobic collapse of the chain, and (c) comparing coarse predictions of the stable folds of moderately large proteins (N~100) with structural information extracted from the protein data bank.
Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.
Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro
2016-09-01
The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Search-based model identification of smart-structure damage
NASA Technical Reports Server (NTRS)
Glass, B. J.; Macalou, A.
1991-01-01
This paper describes the use of a combined model and parameter identification approach, based on modal analysis and artificial intelligence (AI) techniques, for identifying damage or flaws in a rotating truss structure incorporating embedded piezoceramic sensors. This smart structure example is representative of a class of structures commonly found in aerospace systems and next generation space structures. Artificial intelligence techniques of classification, heuristic search, and an object-oriented knowledge base are used in an AI-based model identification approach. A finite model space is classified into a search tree, over which a variant of best-first search is used to identify the model whose stored response most closely matches that of the input. Newly-encountered models can be incorporated into the model space. This adaptativeness demonstrates the potential for learning control. Following this output-error model identification, numerical parameter identification is used to further refine the identified model. Given the rotating truss example in this paper, noisy data corresponding to various damage configurations are input to both this approach and a conventional parameter identification method. The combination of the AI-based model identification with parameter identification is shown to lead to smaller parameter corrections than required by the use of parameter identification alone.
Similarity relations in visual search predict rapid visual categorization
Mohan, Krithika; Arun, S. P.
2012-01-01
How do we perform rapid visual categorization?It is widely thought that categorization involves evaluating the similarity of an object to other category items, but the underlying features and similarity relations remain unknown. Here, we hypothesized that categorization performance is based on perceived similarity relations between items within and outside the category. To this end, we measured the categorization performance of human subjects on three diverse visual categories (animals, vehicles, and tools) and across three hierarchical levels (superordinate, basic, and subordinate levels among animals). For the same subjects, we measured their perceived pair-wise similarities between objects using a visual search task. Regardless of category and hierarchical level, we found that the time taken to categorize an object could be predicted using its similarity to members within and outside its category. We were able to account for several classic categorization phenomena, such as (a) the longer times required to reject category membership; (b) the longer times to categorize atypical objects; and (c) differences in performance across tasks and across hierarchical levels. These categorization times were also accounted for by a model that extracts coarse structure from an image. The striking agreement observed between categorization and visual search suggests that these two disparate tasks depend on a shared coarse object representation. PMID:23092947
Carotenoids Database: structures, chemical fingerprints and distribution among organisms.
Yabuzaki, Junko
2017-01-01
To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system 'Search similar carotenoids' using our original chemical fingerprint 'Carotenoid DB Chemical Fingerprints'. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool 'Search similar profiled organisms'. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. : http://carotenoiddb.jp. © The Author(s) 2017. Published by Oxford University Press.
Carotenoids Database: structures, chemical fingerprints and distribution among organisms
2017-01-01
Abstract To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system ‘Search similar carotenoids’ using our original chemical fingerprint ‘Carotenoid DB Chemical Fingerprints’. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool ‘Search similar profiled organisms’. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis. Database URL: http://carotenoiddb.jp PMID:28365725
Dynamic Forecasting of Zika Epidemics Using Google Trends
Jin, Yuan; Huang, Yong; Lin, Baihan; An, Xiaoping; Feng, Dan; Tong, Yigang
2017-01-01
We developed a dynamic forecasting model for Zika virus (ZIKV), based on real-time online search data from Google Trends (GTs). It was designed to provide Zika virus disease (ZVD) surveillance and detection for Health Departments, and predictive numbers of infection cases, which would allow them sufficient time to implement interventions. In this study, we found a strong correlation between Zika-related GTs and the cumulative numbers of reported cases (confirmed, suspected and total cases; p<0.001). Then, we used the correlation data from Zika-related online search in GTs and ZIKV epidemics between 12 February and 20 October 2016 to construct an autoregressive integrated moving average (ARIMA) model (0, 1, 3) for the dynamic estimation of ZIKV outbreaks. The forecasting results indicated that the predicted data by ARIMA model, which used the online search data as the external regressor to enhance the forecasting model and assist the historical epidemic data in improving the quality of the predictions, are quite similar to the actual data during ZIKV epidemic early November 2016. Integer-valued autoregression provides a useful base predictive model for ZVD cases. This is enhanced by the incorporation of GTs data, confirming the prognostic utility of search query based surveillance. This accessible and flexible dynamic forecast model could be used in the monitoring of ZVD to provide advanced warning of future ZIKV outbreaks. PMID:28060809
Dynamic Forecasting of Zika Epidemics Using Google Trends.
Teng, Yue; Bi, Dehua; Xie, Guigang; Jin, Yuan; Huang, Yong; Lin, Baihan; An, Xiaoping; Feng, Dan; Tong, Yigang
2017-01-01
We developed a dynamic forecasting model for Zika virus (ZIKV), based on real-time online search data from Google Trends (GTs). It was designed to provide Zika virus disease (ZVD) surveillance and detection for Health Departments, and predictive numbers of infection cases, which would allow them sufficient time to implement interventions. In this study, we found a strong correlation between Zika-related GTs and the cumulative numbers of reported cases (confirmed, suspected and total cases; p<0.001). Then, we used the correlation data from Zika-related online search in GTs and ZIKV epidemics between 12 February and 20 October 2016 to construct an autoregressive integrated moving average (ARIMA) model (0, 1, 3) for the dynamic estimation of ZIKV outbreaks. The forecasting results indicated that the predicted data by ARIMA model, which used the online search data as the external regressor to enhance the forecasting model and assist the historical epidemic data in improving the quality of the predictions, are quite similar to the actual data during ZIKV epidemic early November 2016. Integer-valued autoregression provides a useful base predictive model for ZVD cases. This is enhanced by the incorporation of GTs data, confirming the prognostic utility of search query based surveillance. This accessible and flexible dynamic forecast model could be used in the monitoring of ZVD to provide advanced warning of future ZIKV outbreaks.
BIOPEP database and other programs for processing bioactive peptide sequences.
Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata
2008-01-01
This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.
Superconductivity in dense carbon-based materials
Lu, Siyu; Liu, Hanyu; Naumov, Ivan I.; ...
2016-03-08
Guided by a simple strategy in searching of new superconducting materials we predict that high temperature superconductivity can be realized in classes of high-density materials having strong sp 3 chemical bonding and high lattice symmetry. Here, we examine in detail sodalite carbon frameworks doped with simple metals such as Li, Na, and Al. Though such materials share some common features with doped diamond, their doping level is not limited and the density of states at the Fermi level in them can be as high as that in the renowned MgB 2. Altogether, with other factors, this boosts the superconducting temperaturemore » (T c) in the materials investigated to higher levels compared to doped diamond. For example, the superconducting T c of sodalite-like NaC 6 is predicted to be above 100 K. This phase and a series of other sodalite-based superconductors are predicted to be metastable phases but are dynamically stable. In owing to the rigid carbon framework of these and related dense carbon-materials, these doped sodalite-based structures could be recoverable as potentially useful superconductors.« less
Rubinson, Emily H.; Metz, Audrey H.; O'Quin, Jami; Eichman, Brandt F.
2013-01-01
Summary DNA glycosylases safeguard the genome by locating and excising chemically modified bases from DNA. AlkD is a recently discovered bacterial DNA glycosylase that removes positively charged methylpurines from DNA, and was predicted to adopt a protein fold distinct from other DNA repair proteins. The crystal structure of Bacillus cereus AlkD presented here shows that the protein is composed exclusively of helical HEAT-like repeats, which form a solenoid perfectly shaped to accommodate a DNA duplex on the concave surface. Structural analysis of the variant HEAT repeats in AlkD provides a rationale for how this protein scaffolding motif has been modified to bind DNA. We report 7mG excision and DNA binding activities of AlkD mutants, along with a comparison of alkylpurine DNA glycosylase structures. Together, these data provide important insight into the requirements for alkylation repair within DNA and suggest that AlkD utilizes a novel strategy to manipulate DNA in its search for alkylpurine bases. PMID:18585735
Riffet, Vanessa; Vidal, Julien
2017-06-01
The search for functional materials is currently hindered by the difficulty to find significant correlation between constitutive properties of a material and its functional properties. In the case of amorphous materials, the diversity of local structures, chemical composition, impurities and mass densities makes such a connection difficult to be addressed. In this Letter, the relation between refractive index and composition has been investigated for amorphous AlO x materials, including nonstoichiometric AlO x , emphasizing the role of structural defects and the absence of effect of the band gap variation. It is found that the Newton-Drude (ND) relation predicts the refractive index from mass density with a rather high level of precision apart from some structures displaying structural defects. Our results show especially that O- and Al-based defects act as additive local disturbance in the vicinity of band gap, allowing us to decouple the mass density effects from defect effects (n = n[ND] + Δn defect ).
Structural Search for High Pressure CS2 and Xe-Cl Compounds
NASA Astrophysics Data System (ADS)
Zarifi, Niloofar; Tse, John S.
2018-04-01
The recent successful implementation of several methodologies for the prediction of crystal structures based on the first-principles electronic structure have ushered in a new area of computational chemistry. In this study, the two most popular methods, namely genetic evolution and particle swarm optimization, were applied to the investigation of stable crystalline polymorphs of solid carbon disulfide and xenon halides at high pressure. It was found that both methods have their own merits. However, there are subtleties that need to be considered for the proper execution of the methods. We found a two-dimensional (2D) layered structure that may be responsible for the superconductivity in CS2. Except for XeCl2, no thermodynamically stable crystalline Xe halides were found under 60 GPa in the halide-rich region of the phase diagram.
Multi-Objective Community Detection Based on Memetic Algorithm
2015-01-01
Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels. PMID:25932646
Multi-objective community detection based on memetic algorithm.
Wu, Peng; Pan, Li
2015-01-01
Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
2013-01-01
Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
Ferraresi-Curotto, Verónica; Echeverría, Gustavo A; Piro, Oscar E; Pis-Diez, Reinaldo; González-Baró, Ana C
2015-02-25
Five Schiff bases obtained from condensation of 4-methoxybenzohydrazide with related aldehydes, namely o-vanillin, vanillin, 5-bromovanillin, 5-chlorosalicylaldehyde and 5-bromosalicylaldehyde were prepared. A detailed structural and spectroscopic study is reported. The crystal structures of four members of the family were determined and compared with one another. The hydrazones obtained from 5-chlorosalicylaldehyde and 5-bromosalicylaldehyde resulted to be isomorphic to each other. The solid-state structures are stabilized by intra-molecular O-H⋯N interactions in salicylaldehyde derivatives between the O-H moiety from the aldehyde and the hydrazone nitrogen atom. All crystals are further stabilized by inter-molecular H-bonds mediated by the crystallization water molecule. A comparative analysis between experimental and theoretical results is presented. The conformational space was searched and geometries were optimized both in gas phase and including solvent effects. The structure is predicted for the compound for which the crystal structure was not determined. Infrared and electronic spectra were measured and assigned with the help of data obtained from computational methods based on the Density Functional Theory. Copyright © 2014 Elsevier B.V. All rights reserved.
Omar, Hani; Hoang, Van Hai; Liu, Duen-Ren
2016-01-01
Enhancing sales and operations planning through forecasting analysis and business intelligence is demanded in many industries and enterprises. Publishing industries usually pick attractive titles and headlines for their stories to increase sales, since popular article titles and headlines can attract readers to buy magazines. In this paper, information retrieval techniques are adopted to extract words from article titles. The popularity measures of article titles are then analyzed by using the search indexes obtained from Google search engine. Backpropagation Neural Networks (BPNNs) have successfully been used to develop prediction models for sales forecasting. In this study, we propose a novel hybrid neural network model for sales forecasting based on the prediction result of time series forecasting and the popularity of article titles. The proposed model uses the historical sales data, popularity of article titles, and the prediction result of a time series, Autoregressive Integrated Moving Average (ARIMA) forecasting method to learn a BPNN-based forecasting model. Our proposed forecasting model is experimentally evaluated by comparing with conventional sales prediction techniques. The experimental result shows that our proposed forecasting method outperforms conventional techniques which do not consider the popularity of title words.
Omar, Hani; Hoang, Van Hai; Liu, Duen-Ren
2016-01-01
Enhancing sales and operations planning through forecasting analysis and business intelligence is demanded in many industries and enterprises. Publishing industries usually pick attractive titles and headlines for their stories to increase sales, since popular article titles and headlines can attract readers to buy magazines. In this paper, information retrieval techniques are adopted to extract words from article titles. The popularity measures of article titles are then analyzed by using the search indexes obtained from Google search engine. Backpropagation Neural Networks (BPNNs) have successfully been used to develop prediction models for sales forecasting. In this study, we propose a novel hybrid neural network model for sales forecasting based on the prediction result of time series forecasting and the popularity of article titles. The proposed model uses the historical sales data, popularity of article titles, and the prediction result of a time series, Autoregressive Integrated Moving Average (ARIMA) forecasting method to learn a BPNN-based forecasting model. Our proposed forecasting model is experimentally evaluated by comparing with conventional sales prediction techniques. The experimental result shows that our proposed forecasting method outperforms conventional techniques which do not consider the popularity of title words. PMID:27313605
32 CFR 552.18 - Administration.
Code of Federal Regulations, 2010 CFR
2010-07-01
... when leaving facilities for which the Army has responsibility. These searches are authorized when based... prominently displayed sign, AR 420-70, (Buildings and Structures)), that they are liable to search when... authorized to search persons (or possessions, including vehicles of individuals), based on military necessity...
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.
Agarwal, Vibhu; Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H
2016-09-21
By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model's utility. We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate. Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers.
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis
Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H
2016-01-01
Background By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. Objective The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user’s future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. Methods We inferred presence within the geofence of a medical facility from location and duration information from users’ search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model’s utility. Results We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate. Conclusions Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers. PMID:27655225
Ribeiro, Taisa Pereira Piacentini; Manarin, Flávia Giovana; Borges de Melo, Eduardo
2018-05-30
To address the rising global demand for food, it is necessary to search for new herbicides that can control resistant weeds. We performed a 2D-quantitative structure-activity relationship (QSAR) study to predict compounds with photosynthesis-inhibitory activity. A data set of 44 compounds (quinolines and naphthalenes), which are described as photosynthetic electron transport (PET) inhibitors, was used. The obtained model was approved in internal and external validation tests. 2D Similarity-based virtual screening was performed and 64 compounds were selected from the ZINC database. By using the VEGA QSAR software, 48 compounds were shown to have potential toxic effects (mutagenicity and carcinogenicity). Therefore, the model was also tested using a set of 16 molecules obtained by a similarity search of the ZINC database. Six compounds showed good predicted inhibition of PET. The obtained model shows potential utility in the design of new PET inhibitors, and the hit compounds found by virtual screening are novel bicyclic scaffolds of this class. Copyright © 2018 Elsevier Inc. All rights reserved.
FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling.
Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2016-07-01
Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Yu, Jinchao; Vavrusa, Marek; Andreani, Jessica; Rey, Julien; Tufféry, Pierre; Guerois, Raphaël
2016-01-01
The structural modeling of protein–protein interactions is key in understanding how cell machineries cross-talk with each other. Molecular docking simulations provide efficient means to explore how two unbound protein structures interact. InterEvDock is a server for protein docking based on a free rigid-body docking strategy. A systematic rigid-body docking search is performed using the FRODOCK program and the resulting models are re-scored with InterEvScore and SOAP-PP statistical potentials. The InterEvScore potential was specifically designed to integrate co-evolutionary information in the docking process. InterEvDock server is thus particularly well suited in case homologous sequences are available for both binding partners. The server returns 10 structures of the most likely consensus models together with 10 predicted residues most likely involved in the interface. In 91% of all complexes tested in the benchmark, at least one residue out of the 10 predicted is involved in the interface, providing useful guidelines for mutagenesis. InterEvDock is able to identify a correct model among the top10 models for 49% of the rigid-body cases with evolutionary information, making it a unique and efficient tool to explore structural interactomes under an evolutionary perspective. The InterEvDock web interface is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock/. PMID:27131368
Predicting Airport Screening Officers' Visual Search Competency With a Rapid Assessment.
Mitroff, Stephen R; Ericson, Justin M; Sharpe, Benjamin
2018-03-01
Objective The study's objective was to assess a new personnel selection and assessment tool for aviation security screeners. A mobile app was modified to create a tool, and the question was whether it could predict professional screeners' on-job performance. Background A variety of professions (airport security, radiology, the military, etc.) rely on visual search performance-being able to detect targets. Given the importance of such professions, it is necessary to maximize performance, and one means to do so is to select individuals who excel at visual search. A critical question is whether it is possible to predict search competency within a professional search environment. Method Professional searchers from the USA Transportation Security Administration (TSA) completed a rapid assessment on a tablet-based X-ray simulator (XRAY Screener, derived from the mobile technology app Airport Scanner; Kedlin Company). The assessment contained 72 trials that were simulated X-ray images of bags. Participants searched for prohibited items and tapped on them with their finger. Results Performance on the assessment significantly related to on-job performance measures for the TSA officers such that those who were better XRAY Screener performers were both more accurate and faster at the actual airport checkpoint. Conclusion XRAY Screener successfully predicted on-job performance for professional aviation security officers. While questions remain about the underlying cognitive mechanisms, this quick assessment was found to significantly predict on-job success for a task that relies on visual search performance. Application It may be possible to quickly assess an individual's visual search competency, which could help organizations select new hires and assess their current workforce.
Shahbaaz, Mohd; Ahmad, Faizan; Imtaiyaz Hassan, Md
2015-06-01
Haemophilus influenzae is a small pleomorphic Gram-negative bacteria which causes several chronic diseases, including bacteremia, meningitis, cellulitis, epiglottitis, septic arthritis, pneumonia, and empyema. Here we extensively analyzed the sequenced genome of H. influenzae strain Rd KW20 using protein family databases, protein structure prediction, pathways and genome context methods to assign a precise function to proteins whose functions are unknown. These proteins are termed as hypothetical proteins (HPs), for which no experimental information is available. Function prediction of these proteins would surely be supportive to precisely understand the biochemical pathways and mechanism of pathogenesis of Haemophilus influenzae. During the extensive analysis of H. influenzae genome, we found the presence of eight HPs showing lyase activity. Subsequently, we modeled and analyzed three-dimensional structure of all these HPs to determine their functions more precisely. We found these HPs possess cystathionine-β-synthase, cyclase, carboxymuconolactone decarboxylase, pseudouridine synthase A and C, D-tagatose-1,6-bisphosphate aldolase and aminodeoxychorismate lyase-like features, indicating their corresponding functions in the H. influenzae. Lyases are actively involved in the regulation of biosynthesis of various hormones, metabolic pathways, signal transduction, and DNA repair. Lyases are also considered as a key player for various biological processes. These enzymes are critically essential for the survival and pathogenesis of H. influenzae and, therefore, these enzymes may be considered as a potential target for structure-based rational drug design. Our structure-function relationship analysis will be useful to search and design potential lead molecules based on the structure of these lyases, for drug design and discovery.
Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi
2018-05-21
With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.
Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.
Eddy, Sean R
2014-01-01
Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman
2014-01-01
Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380
GOLDRUSH. III. A systematic search for protoclusters at z ˜ 4 based on the >100 deg2 area
NASA Astrophysics Data System (ADS)
Toshikawa, Jun; Uchiyama, Hisakazu; Kashikawa, Nobunari; Ouchi, Masami; Overzier, Roderik; Ono, Yoshiaki; Harikane, Yuichi; Ishikawa, Shogo; Kodama, Tadayuki; Matsuda, Yuichi; Lin, Yen-Ting; Onoue, Masafusa; Tanaka, Masayuki; Nagao, Tohru; Akiyama, Masayuki; Komiyama, Yutaka; Goto, Tomotsugu; Lee, Chien-Hsiu
2018-01-01
We conduct a systematic search for galaxy protoclusters at z ˜ 3.8 based on the latest internal data release (S16A) of the Hyper Suprime-Cam Subaru strategic program (HSC-SSP). In the Wide layer of the HSC-SSP, we investigate the large-scale projected sky distribution of g-dropout galaxies over an area of 121 deg2, and identify 216 large-scale overdense regions (>4 σ overdensity significance) that are likely protocluster candidates. Of these, 37 are located within 8΄ (3.4 physical Mpc) of other protocluster candidates of higher overdensity, and are expected to merge into a single massive structure by z = 0. Therefore, we find 179 unique protocluster candidates in our survey. A cosmological simulation that includes projection effects predicts that more than 76% of these candidates will evolve into galaxy clusters with halo masses of at least 1014 M⊙ by z = 0. The unprecedented size of our protocluster candidate catalog allows us to perform, for the first time, an angular clustering analysis of the systematic sample of protocluster candidates. We find a correlation length of 35.0 h-1 Mpc. The relation between correlation length and number density of z ˜ 3.8 protocluster candidates is consistent with the prediction of the ΛCDM model, and the correlation length is similar to that of rich clusters in the local universe. This result suggests that our protocluster candidates are tracing similar spatial structures to those expected from the progenitors of rich clusters, and enhances the confidence that our method for identifying protoclusters at high redshifts is robust. In years to come, our protocluster search will be extended to the entire HSC-SSP Wide sky coverage of ˜ 1400 deg2 to probe cluster formation over a wide redshift range of z ˜ 2-6.
XTALOPT: An open-source evolutionary algorithm for crystal structure prediction
NASA Astrophysics Data System (ADS)
Lonie, David C.; Zurek, Eva
2011-02-01
The implementation and testing of XTALOPT, an evolutionary algorithm for crystal structure prediction, is outlined. We present our new periodic displacement (ripple) operator which is ideally suited to extended systems. It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search. This allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions. A continuous workflow, which makes better use of computational resources as compared to traditional generation based algorithms, is employed. Various parameters in XTALOPT are optimized using a novel benchmarking scheme. XTALOPT is available under the GNU Public License, has been interfaced with various codes commonly used to study extended systems, and has an easy to use, intuitive graphical interface. Program summaryProgram title:XTALOPT Catalogue identifier: AEGX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v2.1 or later [1] No. of lines in distributed program, including test data, etc.: 36 849 No. of bytes in distributed program, including test data, etc.: 1 149 399 Distribution format: tar.gz Programming language: C++ Computer: PCs, workstations, or clusters Operating system: Linux Classification: 7.7 External routines: QT [2], OpenBabel [3], AVOGADRO [4], SPGLIB [8] and one of: VASP [5], PWSCF [6], GULP [7]. Nature of problem: Predicting the crystal structure of a system from its stoichiometry alone remains a grand challenge in computational materials science, chemistry, and physics. Solution method: Evolutionary algorithms are stochastic search techniques which use concepts from biological evolution in order to locate the global minimum on their potential energy surface. Our evolutionary algorithm, XTALOPT, is freely available to the scientific community for use and collaboration under the GNU Public License. Running time: User dependent. The program runs until stopped by the user.
Efficient RNA structure comparison algorithms.
Arslan, Abdullah N; Anandan, Jithendar; Fry, Eric; Monschke, Keith; Ganneboina, Nitin; Bowerman, Jason
2017-12-01
Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.
Zhuang, Chunlin; Narayanapillai, Sreekanth; Zhang, Wannian; Sham, Yuk Yin; Xing, Chengguo
2014-02-13
In this study, rapid structure-based virtual screening and hit-based substructure search were utilized to identify small molecules that disrupt the interaction of Keap1-Nrf2. Special emphasis was placed toward maximizing the exploration of chemical diversity of the initial hits while economically establishing informative structure-activity relationship (SAR) of novel scaffolds. Our most potent noncovalent inhibitor exhibits three times improved cellular activation in Nrf2 activation than the most active noncovalent Keap1 inhibitor known to date.
Geyer, Thomas; Shi, Zhuanghua; Müller, Hermann J
2010-06-01
Three experiments examined memory-based guidance of visual search using a modified version of the contextual-cueing paradigm (Jiang & Chun, 2001). The target, if present, was a conjunction of color and orientation, with target (and distractor) features randomly varying across trials (multiconjunction search). Under these conditions, reaction times (RTs) were faster when all items in the display appeared at predictive ("old") relative to nonpredictive ("new") locations. However, this RT benefit was smaller compared to when only one set of items, namely that sharing the target's color (but not that in the alternative color) appeared in predictive arrangement. In all conditions, contextual cueing was reliable on both target-present and -absent trials and enhanced if a predictive display was preceded by a predictive (though differently arranged) display, rather than a nonpredictive display. These results suggest that (1) contextual cueing is confined to color subsets of items, that (2) retrieving contextual associations for one color subset of items can be impeded by associations formed within the alternative subset ("contextual interference"), and (3) that contextual cueing is modulated by intertrial priming.
2017-01-01
In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks. PMID:29236718
Sea Ice Prediction Has Easy and Difficult Years
NASA Technical Reports Server (NTRS)
Hamilton, Lawrence C.; Bitz, Cecilia M.; Blanchard-Wrigglesworth, Edward; Cutler, Matthew; Kay, Jennifer; Meier, Walter N.; Stroeve, Julienne; Wiggins, Helen
2014-01-01
Arctic sea ice follows an annual cycle, reaching its low point in September each year. The extent of sea ice remaining at this low point has been trending downwards for decades as the Arctic warms. Around the long-term downward trend, however, there is significant variation in the minimum extent from one year to the next. Accurate forecasts of yearly conditions would have great value to Arctic residents, shipping companies, and other stakeholders and are the subject of much current research. Since 2008 the Sea Ice Outlook (SIO) (http://www.arcus.org/search-program/seaiceoutlook) organized by the Study of Environmental Arctic Change (SEARCH) (http://www.arcus.org/search-program) has invited predictions of the September Arctic sea ice minimum extent, which are contributed from the Arctic research community. Individual predictions, based on a variety of approaches, are solicited in three cycles each year in early June, July, and August. (SEARCH 2013).
HDAPD: a web tool for searching the disease-associated protein structures
2010-01-01
Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919
Untangling the fungal niche: the trait-based approach.
Crowther, Thomas W; Maynard, Daniel S; Crowther, Terence R; Peccia, Jordan; Smith, Jeffrey R; Bradford, Mark A
2014-01-01
Fungi are prominent components of most terrestrial ecosystems, both in terms of biomass and ecosystem functioning, but the hyper-diverse nature of most communities has obscured the search for unifying principles governing community organization. In particular, unlike plants and animals, observational studies provide little evidence for the existence of niche processes in structuring fungal communities at broad spatial scales. This limits our capacity to predict how communities, and their functioning, vary across landscapes. We outline how a shift in focus, from taxonomy toward functional traits, might prove to be valuable in the search for general patterns in fungal ecology. We build on theoretical advances in plant and animal ecology to provide an empirical framework for a trait-based approach in fungal community ecology. Drawing upon specific characteristics of the fungal system, we highlight the significance of drought stress and combat in structuring free-living fungal communities. We propose a conceptual model to formalize how trade-offs between stress-tolerance and combative dominance are likely to organize communities across environmental gradients. Given that the survival of a fungus in a given environment is contingent on its ability to tolerate antagonistic competitors, measuring variation in combat trait expression along environmental gradients provides a means of elucidating realized, from fundamental niche spaces. We conclude that, using a trait-based understanding of how niche processes structure fungal communities across time and space, we can ultimately link communities with ecosystem functioning. Our trait-based framework highlights fundamental uncertainties that require testing in the fungal system, given their potential to uncover general mechanisms in fungal ecology.
Vertical detachment energies of anionic thymidine: Microhydration effects.
Kim, Sunghwan; Schaefer, Henry F
2010-10-14
Density functional theory has been employed to investigate microhydration effects on the vertical detachment energy (VDE) of the thymidine anion by considering the various structures of its monohydrates. Structures were located using a random searching procedure. Among 14 distinct structures of the anionic thymidine monohydrate, the low-energy structures, in general, have the water molecule bound to the thymine base unit. The negative charge developed on the thymine moiety increases the strength of the intermolecular hydrogen bonding between the water and base units. The computed VDE values of the thymidine monohydrate anions are predicted to range from 0.67 to 1.60 eV and the lowest-energy structure has a VDE of 1.32 eV. The VDEs of the monohydrates of the thymidine anion, where the N(1)[Single Bond]H hydrogen of thymine has been replaced by a 2(')-deoxyribose ring, are greater by ∼0.30 eV, compared to those of the monohydrates of the thymine anion. The results of the present study are in excellent agreement with the accompanying experimental results of Bowen and co-workers [J. Chem. Phys. 133, 144304 (2010)].
WebCSD: the online portal to the Cambridge Structural Database
Thomas, Ian R.; Bruno, Ian J.; Cole, Jason C.; Macrae, Clare F.; Pidcock, Elna; Wood, Peter A.
2010-01-01
WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studied in the results browser using the efficient entry summaries and embedded three-dimensional viewer. PMID:22477776
Supercomputer applications in molecular modeling.
Gund, T M
1988-01-01
An overview of the functions performed by molecular modeling is given. Molecular modeling techniques benefiting from supercomputing are described, namely, conformation, search, deriving bioactive conformations, pharmacophoric pattern searching, receptor mapping, and electrostatic properties. The use of supercomputers for problems that are computationally intensive, such as protein structure prediction, protein dynamics and reactivity, protein conformations, and energetics of binding is also examined. The current status of supercomputing and supercomputer resources are discussed.
Wang, Yongcui; Chen, Shilong; Deng, Naiyang; Wang, Yong
2013-01-01
Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems. PMID:24244318
Search and design of nonmagnetic centrosymmetric layered crystals with large local spin polarization
NASA Astrophysics Data System (ADS)
Liu, Qihang; Zhang, Xiuwen; Jin, Hosub; Lam, Kanber; Im, Jino; Freeman, Arthur J.; Zunger, Alex
2015-06-01
Until recently, spin polarization in nonmagnetic materials was the exclusive territory of noncentrosymmetric structures. It was recently shown that a form of "hidden spin polarization" (named the "Rashba-2" or "R-2" effect) could exist in globally centrosymmetric crystals provided the individual layers belong to polar point group symmetries. This realization could considerably broaden the range of materials that might be considered for spin-polarization spintronic applications to include the hitherto "forbidden spintronic compound" that belongs to centrosymmetric symmetries. Here we take the necessary steps to transition from such general, material-agnostic condensed matter theory arguments to material-specific "design principles" that could aid future laboratory search of R-2 materials. Specifically, we (i) classify different prototype layered structures that have been broadly studied in the literature in terms of their expected R-2 behavior, including the B i2S e3 -structure type (a prototype topological insulator), Mo S2 -structure type (a prototype valleytronic compound), and LaBiO S2 -structure type (a host of superconductivity upon doping); (ii) formulate the properties that ideal R-2 compounds should have in terms of combination of their global unit cell symmetries with specific point group symmetries of their constituent "sectors"; and (iii) use first-principles band theory to search for compounds from the prototype family of LaOBi S2 -type structures that satisfy these R-2 design metrics. We initially consider both stable and hypothetical M'O M X2 (M': Sc, Y, La, Ce, Pr, Nd, Al, Ga, In, Tl; M: P, As, Sb, Bi; X: S, Se, Te) compounds to establish an understanding of trends of R-2 with composition, and then indicate the predictions that are expected to be stable and synthesizable. We predict large spin splittings (up to ˜200 meV for holes in LaOBiT e2 ) as well as surface Rashba states. Experimental testing of such predictions is called for.
Swellix: a computational tool to explore RNA conformational space.
Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J
2017-11-21
The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
CAMEL: concept annotated image libraries
NASA Astrophysics Data System (ADS)
Natsev, Apostol; Chadha, Atul; Soetarman, Basuki; Vitter, Jeffrey S.
2001-01-01
The problem of content-based image searching has received considerable attention in the last few years. Thousands of images are now available on the Internet, and many important applications require searching of images in domains such as E-commerce, medical imaging, weather prediction, satellite imagery, and so on. Yet, content-based image querying is still largely unestablished as a mainstream field, nor is it widely used by search engines. We believe that two of the major hurdles for this poor acceptance are poor retrieval quality and usability.
CAMEL: concept annotated image libraries
NASA Astrophysics Data System (ADS)
Natsev, Apostol; Chadha, Atul; Soetarman, Basuki; Vitter, Jeffrey S.
2000-12-01
The problem of content-based image searching has received considerable attention in the last few years. Thousands of images are now available on the Internet, and many important applications require searching of images in domains such as E-commerce, medical imaging, weather prediction, satellite imagery, and so on. Yet, content-based image querying is still largely unestablished as a mainstream field, nor is it widely used by search engines. We believe that two of the major hurdles for this poor acceptance are poor retrieval quality and usability.
A Viral-Human Interactome Based on Structural Motif-Domain Interactions Captures the Human Infectome
Guo, Xianwu; Rodríguez-Pérez, Mario A.
2013-01-01
Protein interactions between a pathogen and its host are fundamental in the establishment of the pathogen and underline the infection mechanism. In the present work, we developed a single predictive model for building a host-viral interactome based on the identification of structural descriptors from motif-domain interactions of protein complexes deposited in the Protein Data Bank (PDB). The structural descriptors were used for searching, in a database of protein sequences of human and five clinically important viruses; therefore, viral and human proteins sharing a descriptor were predicted as interacting proteins. The analysis of the host-viral interactome allowed to identify a set of new interactions that further explain molecular mechanism associated with viral infections and showed that it was able to capture human proteins already associated to viral infections (human infectome) and non-infectious diseases (human diseasome). The analysis of human proteins targeted by viral proteins in the context of a human interactome showed that their neighbors are enriched in proteins reported with differential expression under infection and disease conditions. It is expected that the findings of this work will contribute to the development of systems biology for infectious diseases, and help guide the rational identification and prioritization of novel drug targets. PMID:23951184
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.
De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning
2015-10-01
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2010-09-30
The Umbra gbs (Graph-Based Search) library provides implementations of graph-based search/planning algorithms that can be applied to legacy graph data structures. Unlike some other graph algorithm libraries, this one does not require your graph class to inherit from a specific base class. Implementations of Dijkstra's Algorithm and A-Star search are included and can be used with graphs that are lazily-constructed.
Fantini, Jacques; Garmy, Nicolas; Yahi, Nouara
2006-09-12
Protein-glycolipid interactions mediate the attachment of various pathogens to the host cell surface as well as the association of numerous cellular proteins with lipid rafts. Thus, it is of primary importance to identify the protein domains involved in glycolipid recognition. Using structure similarity searches, we could identify a common glycolipid-binding domain in the three-dimensional structure of several proteins known to interact with lipid rafts. Yet the three-dimensional structure of most raft-targeted proteins is still unknown. In the present study, we have identified a glycolipid-binding domain in the amino acid sequence of a bacterial adhesin (Helicobacter pylori adhesin A, HpaA). The prediction was based on the major properties of the glycolipid-binding domains previously characterized by structural searches. A short (15-mer) synthetic peptide corresponding to this putative glycolipid-binding domain was synthesized, and we studied its interaction with glycolipid monolayers at the air-water interface. The synthetic HpaA peptide recognized LacCer but not Gb3. This glycolipid specificity was in line with that of the whole bacterium. Molecular modeling studies gave some insights into this high selectivity of interaction. It also suggested that Phe147 in HpaA played a key role in LacCer recognition, through sugar-aromatic CH-pi stacking interactions with the hydrophobic side of the galactose ring of LacCer. Correspondingly, the replacement of Phe147 with Ala strongly affected LacCer recognition, whereas substitution with Trp did not. Our method could be used to identify glycolipid-binding domains in microbial and cellular proteins interacting with lipid shells, rafts, and other specialized membrane microdomains.
Predicting New Materials for Hydrogen Storage Application
Vajeeston, Ponniah; Ravindran, Ponniah; Fjellvåg, Helmer
2009-01-01
Knowledge about the ground-state crystal structure is a prerequisite for the rational understanding of solid-state properties of new materials. To act as an efficient energy carrier, hydrogen should be absorbed and desorbed in materials easily and in high quantities. Owing to the complexity in structural arrangements and difficulties involved in establishing hydrogen positions by x-ray diffraction methods, the structural information of hydrides are very limited compared to other classes of materials (like oxides, intermetallics, etc.). This can be overcome by conducting computational simulations combined with selected experimental study which can save environment, money, and man power. The predicting capability of first-principles density functional theory (DFT) is already well recognized and in many cases structural and thermodynamic properties of single/multi component system are predicted. This review will focus on possible new classes of materials those have high hydrogen content, demonstrate the ability of DFT to predict crystal structure, and search for potential meta-stable phases. Stabilization of such meta-stable phases is also discussed.
Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space.
Miyao, Tomoyuki; Funatsu, Kimito
2017-08-01
When chemical structures are searched based on descriptor values, or descriptors are interpreted based on values, it is important that corresponding chemical structures actually exist. In order to consider the existence of chemical structures located in a specific region in the chemical space, we propose to search them inside training data domains (TDDs), which are dense areas of a training dataset in the chemical space. We investigated TDDs' features using diverse and local datasets, assuming that GDB11 is the chemical universe. These two analyses showed that considering TDDs gives higher chance of finding chemical structures than a random search-based method, and that novel chemical structures actually exist inside TDDs. In addition to those findings, we tested the hypothesis that chemical structures were distributed on the limited areas of chemical space. This hypothesis was confirmed by the fact that distances among chemical structures in several descriptor spaces were much shorter than those among randomly generated coordinates in the training data range. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Harmony Search as a Powerful Tool for Feature Selection in QSPR Study of the Drugs Lipophilicity.
Bahadori, Behnoosh; Atabati, Morteza
2017-01-01
Aims & Scope: Lipophilicity represents one of the most studied and most frequently used fundamental physicochemical properties. In the present work, harmony search (HS) algorithm is suggested to feature selection in quantitative structure-property relationship (QSPR) modeling to predict lipophilicity of neutral, acidic, basic and amphotheric drugs that were determined by UHPLC. Harmony search is a music-based metaheuristic optimization algorithm. It was affected by the observation that the aim of music is to search for a perfect state of harmony. Semi-empirical quantum-chemical calculations at AM1 level were used to find the optimum 3D geometry of the studied molecules and variant descriptors (1497 descriptors) were calculated by the Dragon software. The selected descriptors by harmony search algorithm (9 descriptors) were applied for model development using multiple linear regression (MLR). In comparison with other feature selection methods such as genetic algorithm and simulated annealing, harmony search algorithm has better results. The root mean square error (RMSE) with and without leave-one out cross validation (LOOCV) were obtained 0.417 and 0.302, respectively. The results were compared with those obtained from the genetic algorithm and simulated annealing methods and it showed that the HS is a helpful tool for feature selection with fine performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Oligomerization of G protein-coupled receptors: computational methods.
Selent, J; Kaczor, A A
2011-01-01
Recent research has unveiled the complexity of mechanisms involved in G protein-coupled receptor (GPCR) functioning in which receptor dimerization/oligomerization may play an important role. Although the first high-resolution X-ray structure for a likely functional chemokine receptor dimer has been deposited in the Protein Data Bank, the interactions and mechanisms of dimer formation are not yet fully understood. In this respect, computational methods play a key role for predicting accurate GPCR complexes. This review outlines computational approaches focusing on sequence- and structure-based methodologies as well as discusses their advantages and limitations. Sequence-based approaches that search for possible protein-protein interfaces in GPCR complexes have been applied with success in several studies, but did not yield always consistent results. Structure-based methodologies are a potent complement to sequence-based approaches. For instance, protein-protein docking is a valuable method especially when guided by experimental constraints. Some disadvantages like limited receptor flexibility and non-consideration of the membrane environment have to be taken into account. Molecular dynamics simulation can overcome these drawbacks giving a detailed description of conformational changes in a native-like membrane. Successful prediction of GPCR complexes using computational approaches combined with experimental efforts may help to understand the role of dimeric/oligomeric GPCR complexes for fine-tuning receptor signaling. Moreover, since such GPCR complexes have attracted interest as potential drug target for diverse diseases, unveiling molecular determinants of dimerization/oligomerization can provide important implications for drug discovery.
Fast protein tertiary structure retrieval based on global surface shape similarity.
Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke
2008-09-01
Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.
Neural network based chemical structure indexing.
Rughooputh, S D; Rughooputh, H C
2001-01-01
Searches on chemical databases are presently dominated by the text-based content of a paper which can be indexed into a keyword searchable form. Such traditional searches can prove to be very time-consuming and discouraging to the less frequent scientist. We report a simple chemical indexing based on the molecular structure alone. The method used is based on a one-to-one correspondence between the chemical structure presented as an image to a neural network and the corresponding binary output. The method is direct and less cumbersome (compared with traditional methods) and proves to be robust, elegant, and very versatile.
Essie: A Concept-based Search Engine for Structured Biomedical Text
Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina
2007-01-01
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729
Recent progress and future directions in protein-protein docking.
Ritchie, David W
2008-02-01
This article gives an overview of recent progress in protein-protein docking and it identifies several directions for future research. Recent results from the CAPRI blind docking experiments show that docking algorithms are steadily improving in both reliability and accuracy. Current docking algorithms employ a range of efficient search and scoring strategies, including e.g. fast Fourier transform correlations, geometric hashing, and Monte Carlo techniques. These approaches can often produce a relatively small list of up to a few thousand orientations, amongst which a near-native binding mode is often observed. However, despite the use of improved scoring functions which typically include models of desolvation, hydrophobicity, and electrostatics, current algorithms still have difficulty in identifying the correct solution from the list of false positives, or decoys. Nonetheless, significant progress is being made through better use of bioinformatics, biochemical, and biophysical information such as e.g. sequence conservation analysis, protein interaction databases, alanine scanning, and NMR residual dipolar coupling restraints to help identify key binding residues. Promising new approaches to incorporate models of protein flexibility during docking are being developed, including the use of molecular dynamics snapshots, rotameric and off-rotamer searches, internal coordinate mechanics, and principal component analysis based techniques. Some investigators now use explicit solvent models in their docking protocols. Many of these approaches can be computationally intensive, although new silicon chip technologies such as programmable graphics processor units are beginning to offer competitive alternatives to conventional high performance computer systems. As cryo-EM techniques improve apace, docking NMR and X-ray protein structures into low resolution EM density maps is helping to bridge the resolution gap between these complementary techniques. The use of symmetry and fragment assembly constraints are also helping to make possible docking-based predictions of large multimeric protein complexes. In the near future, the closer integration of docking algorithms with protein interface prediction software, structural databases, and sequence analysis techniques should help produce better predictions of protein interaction networks and more accurate structural models of the fundamental molecular interactions within the cell.
Guiding principles for peptide nanotechnology through directed discovery.
Lampel, A; Ulijn, R V; Tuttle, T
2018-05-21
Life's diverse molecular functions are largely based on only a small number of highly conserved building blocks - the twenty canonical amino acids. These building blocks are chemically simple, but when they are organized in three-dimensional structures of tremendous complexity, new properties emerge. This review explores recent efforts in the directed discovery of functional nanoscale systems and materials based on these same amino acids, but that are not guided by copying or editing biological systems. The review summarises insights obtained using three complementary approaches of searching the sequence space to explore sequence-structure relationships for assembly, reactivity and complexation, namely: (i) strategic editing of short peptide sequences; (ii) computational approaches to predicting and comparing assembly behaviours; (iii) dynamic peptide libraries that explore the free energy landscape. These approaches give rise to guiding principles on controlling order/disorder, complexation and reactivity by peptide sequence design.
Efficient multifeature index structures for music data retrieval
NASA Astrophysics Data System (ADS)
Lee, Wegin; Chen, Arbee L. P.
1999-12-01
In this paper, we propose four index structures for music data retrieval. Based on suffix trees, we develop two index structures called combined suffix tree and independent suffix trees. These methods still show shortcomings for some search functions. Hence we develop another index, called Twin Suffix Trees, to overcome these problems. However, the Twin Suffix Trees lack of scalability when the amount of music data becomes large. Therefore we propose the fourth index, called Grid-Twin Suffix Trees, to provide scalability and flexibility for a large amount of music data. For each index, we can use different search functions, like exact search and approximate search, on different music features, like melody, rhythm or both. We compare the performance of the different search functions applied on each index structure by a series of experiments.
GeneSilico protein structure prediction meta-server.
Kurowski, Michal A; Bujnicki, Janusz M
2003-07-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server
Kurowski, Michal A.; Bujnicki, Janusz M.
2003-01-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borziak, Kirill; Jouline, Igor B
2007-01-01
Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-07-04
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
2016-01-01
Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data. PMID:27377323
NASA Astrophysics Data System (ADS)
Roman, Bart I.; Guedes, Rita C.; Stevens, Christian V.; García-Sosa, Alfonso T.
2018-05-01
In multitarget drug design, it is critical to identify active and inactive compounds against a variety of targets and antitargets. Multitarget strategies thus test the limits of available technology, be that in screening large databases of compounds versus a large number of targets, or in using in silico methods for understanding and reliably predicting these pharmacological outcomes. In this paper, we have evaluated the potential of several in silico approaches to predict the target, antitarget and physicochemical profile of (S)-blebbistatin, the best-known myosin II ATPase inhibitor, and a series of analogs thereof. Standard and augmented structure-based design techniques could not recover the observed activity profiles. A ligand-based method using molecular fingerprints was, however, able to select actives for myosin II inhibition. Using further ligand- and structure-based methods, we also evaluated toxicity through androgen receptor binding, affinity for an array of antitargets and the ADME profile (including assay-interfering compounds) of the series. In conclusion, in the search for (S)-blebbistatin analogs, the dissimilarity distance of molecular fingerprints to known actives and the computed antitarget and physicochemical profile of the molecules can be used for compound design for molecules with potential as tools for modulating myosin II and motility-related diseases.
High pressure behaviour of uranium dicarbide (UC{sub 2}): Ab-initio study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahoo, B. D., E-mail: bdsahoo@barc.gov.in; Mukherjee, D.; Joshi, K. D.
2016-08-28
The structural stability of uranium dicarbide has been examined under hydrostatic compression employing evolutionary structure search algorithm implemented in the universal structure predictor: evolutionary Xtallography (USPEX) code in conjunction with ab-initio electronic band structure calculation method. The ab-initio total energy calculations involved for this purpose have been carried out within both generalized gradient approximations (GGA) and GGA + U approximations. Our calculations under GGA approximation predict the high pressure structural sequence of tetragonal → monoclinic → orthorhombic for this material with transition pressures of ∼8 GPa and 42 GPa, respectively. The same transition sequence is predicted by calculations within GGA + U also with transition pressuresmore » placed at ∼24 GPa and ∼50 GPa, respectively. Further, on the basis of comparison of zero pressure equilibrium volume and equation of state with available experimental data, we find that GGA + U approximation with U = 2.5 eV describes this material better than the simple GGA approximation. The theoretically predicted high pressure structural phase transitions are in disagreement with the only high experimental study by Dancausse et al. [J. Alloys. Compd. 191, 309 (1993)] on this compound which reports a tetragonal to hexagonal phase transition at a pressure of ∼17.6 GPa. Interestingly, during lowest enthalpy structure search using USPEX, we do not see any hexagonal phase to be closer to the predicted monoclinic phase even within 0.2 eV/f. unit. More experiments with varying carbon contents in UC{sub 2} sample are required to resolve this discrepancy. The existence of these high pressure phases predicted by static lattice calculations has been further substantiated by analyzing the elastic and lattice dynamic stability of these structures in the pressure regimes of their structural stability. Additionally, various thermo-physical quantities such as equilibrium volume, bulk modulus, Debye temperature, thermal expansion coefficient, Gruneisen parameter, and heat capacity at ambient conditions have been determined from these calculations and compared with the available experimental data.« less
HIV-1 protease cleavage site prediction based on two-stage feature selection method.
Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong
2013-03-01
Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
Lenselink, Eelke B; Beuming, Thijs; van Veen, Corine; Massink, Arnault; Sherman, Woody; van Vlijmen, Herman W T; IJzerman, Adriaan P
2016-10-01
In this work, we present a case study to explore the challenges associated with finding novel molecules for a receptor that has been studied in depth and has a wealth of chemical information available. Specifically, we apply a previously described protocol that incorporates explicit water molecules in the ligand binding site to prospectively screen over 2.5 million drug-like and lead-like compounds from the commercially available eMolecules database in search of novel binders to the adenosine A 2A receptor (A 2A AR). A total of seventy-one compounds were selected for purchase and biochemical assaying based on high ligand efficiency and high novelty (Tanimoto coefficient ≤0.25 to any A 2A AR tested compound). These molecules were then tested for their affinity to the adenosine A 2A receptor in a radioligand binding assay. We identified two hits that fulfilled the criterion of ~50 % radioligand displacement at a concentration of 10 μM. Next we selected an additional eight novel molecules that were predicted to make a bidentate interaction with Asn253 6.55 , a key interacting residue in the binding pocket of the A 2A AR. None of these eight molecules were found to be active. Based on these results we discuss the advantages of structure-based methods and the challenges associated with finding chemically novel molecules for well-explored targets.
Computational design of materials for solar hydrogen generation
NASA Astrophysics Data System (ADS)
Umezawa, Naoto
Photocatalysis has a great potential for the production of hydrogen from aquerous solution under solar light. In this talk, two different approaches toward the computational materials desing for solar hydrogen generation will be presented. Tin (Sn), which has two major oxidation states, Sn2+ and Sn4+, is abundant on the earth's crust. Recently, visible-light responsive photocatalytc H2 evolution reaction was identified over a mixed valence tin oxide Sn3O4. We have carried out crystal structure prediction for mixed valence tin oxides in different atomic compositions under ambient pressure condition using advanced computational methods based on the evolutionary crystal-structure search and density-functional theory. The predicted novel crystal structures realize the desirable band gaps and band edge positions for H2 evolution under visible light irradiation. It is concluded that multivalent tin oxides have a great potential as an abundant, cheap and environmentally-benign solar-energy conversion photofunctional materials. Transition metal doping is effective for sensitizing SrTiO3 under visible light. We have theoretically investigated the roles of the doped Cr in STO based on hybrid density-functional calculations. Cr atoms are preferably substituting for Ti under any equilibrium growth conditions. The lower oxidation state Cr3+, which is stabilized under an n-type condition of STO, is found to be advantageous for the photocatalytic performance. It is firther predicted that lanthanum is the best codopant for stabilizing the favorable oxidation state, Cr3+. The prediction was validated by our experiments that La and Cr co-doped STO shows the best performance among examined samples. This work was supported by the Japan Science and Technology Agency (JST) Precursory Research for Embryonic Science and Technology (PRESTO) and International Research Fellow program of Japan Society for the Promotion of Science (JSPS) through project P14207.
Harris, Peter R; Sillence, Elizabeth; Briggs, Pam
2011-07-27
How do people decide which sites to use when seeking health advice online? We can assume, from related work in e-commerce, that general design factors known to affect trust in the site are important, but in this paper we also address the impact of factors specific to the health domain. The current study aimed to (1) assess the factorial structure of a general measure of Web trust, (2) model how the resultant factors predicted trust in, and readiness to act on, the advice found on health-related websites, and (3) test whether adding variables from social cognition models to capture elements of the response to threatening, online health-risk information enhanced the prediction of these outcomes. Participants were asked to recall a site they had used to search for health-related information and to think of that site when answering an online questionnaire. The questionnaire consisted of a general Web trust questionnaire plus items assessing appraisals of the site, including threat appraisals, information checking, and corroboration. It was promoted on the hungersite.com website. The URL was distributed via Yahoo and local print media. We assessed the factorial structure of the measures using principal components analysis and modeled how well they predicted the outcome measures using structural equation modeling (SEM) with EQS software. We report an analysis of the responses of participants who searched for health advice for themselves (N = 561). Analysis of the general Web trust questionnaire revealed 4 factors: information quality, personalization, impartiality, and credible design. In the final SEM model, information quality and impartiality were direct predictors of trust. However, variables specific to eHealth (perceived threat, coping, and corroboration) added substantially to the ability of the model to predict variance in trust and readiness to act on advice on the site. The final model achieved a satisfactory fit: χ(2) (5) = 10.8 (P = .21), comparative fit index = .99, root mean square error of approximation = .052. The model accounted for 66% of the variance in trust and 49% of the variance in readiness to act on the advice. Adding variables specific to eHealth enhanced the ability of a model of trust to predict trust and readiness to act on advice.
Harris, Peter R; Briggs, Pam
2011-01-01
Background How do people decide which sites to use when seeking health advice online? We can assume, from related work in e-commerce, that general design factors known to affect trust in the site are important, but in this paper we also address the impact of factors specific to the health domain. Objective The current study aimed to (1) assess the factorial structure of a general measure of Web trust, (2) model how the resultant factors predicted trust in, and readiness to act on, the advice found on health-related websites, and (3) test whether adding variables from social cognition models to capture elements of the response to threatening, online health-risk information enhanced the prediction of these outcomes. Methods Participants were asked to recall a site they had used to search for health-related information and to think of that site when answering an online questionnaire. The questionnaire consisted of a general Web trust questionnaire plus items assessing appraisals of the site, including threat appraisals, information checking, and corroboration. It was promoted on the hungersite.com website. The URL was distributed via Yahoo and local print media. We assessed the factorial structure of the measures using principal components analysis and modeled how well they predicted the outcome measures using structural equation modeling (SEM) with EQS software. Results We report an analysis of the responses of participants who searched for health advice for themselves (N = 561). Analysis of the general Web trust questionnaire revealed 4 factors: information quality, personalization, impartiality, and credible design. In the final SEM model, information quality and impartiality were direct predictors of trust. However, variables specific to eHealth (perceived threat, coping, and corroboration) added substantially to the ability of the model to predict variance in trust and readiness to act on advice on the site. The final model achieved a satisfactory fit: χ2 5 = 10.8 (P = .21), comparative fit index = .99, root mean square error of approximation = .052. The model accounted for 66% of the variance in trust and 49% of the variance in readiness to act on the advice. Conclusions Adding variables specific to eHealth enhanced the ability of a model of trust to predict trust and readiness to act on advice. PMID:21795237
2015-01-01
Background In recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query. Results Each article is represented as a set of concepts in the proposed method. Then, by using similarities among concepts formulated from databases such as Gene Ontology, similarities between articles are evaluated. In this framework, the desired search results vary depending on the user's search intention because a variety of information is included in a single article. Therefore, the proposed method provides not only one input article (primary article) but also additional articles related to it as an input query to determine the search intention of the user, based on the relationship between two query articles. In other words, based on the concepts contained in the input article and additional articles, we actualize a relevant literature search that considers user intention by varying the degree of attention given to each concept and modifying the concept hierarchy graph. Conclusions We performed an experiment to retrieve relevant papers from articles on protein structure analysis registered in the Protein Data Bank by using three query datasets. The experimental results yielded search results with better accuracy than when user intention was not considered, confirming the effectiveness of the proposed method. PMID:25952498
Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou
2016-01-01
The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646
Glycan fragment database: a database of PDB-based glycan 3D structures.
Jo, Sunhwan; Im, Wonpil
2013-01-01
The glycan fragment database (GFDB), freely available at http://www.glycanstructure.org, is a database of the glycosidic torsion angles derived from the glycan structures in the Protein Data Bank (PDB). Analogous to protein structure, the structure of an oligosaccharide chain in a glycoprotein, referred to as a glycan, can be characterized by the torsion angles of glycosidic linkages between relatively rigid carbohydrate monomeric units. Knowledge of accessible conformations of biologically relevant glycans is essential in understanding their biological roles. The GFDB provides an intuitive glycan sequence search tool that allows the user to search complex glycan structures. After a glycan search is complete, each glycosidic torsion angle distribution is displayed in terms of the exact match and the fragment match. The exact match results are from the PDB entries that contain the glycan sequence identical to the query sequence. The fragment match results are from the entries with the glycan sequence whose substructure (fragment) or entire sequence is matched to the query sequence, such that the fragment results implicitly include the influences from the nearby carbohydrate residues. In addition, clustering analysis based on the torsion angle distribution can be performed to obtain the representative structures among the searched glycan structures.
A Search for Kilonovae in the Dark Energy Survey
Doctor, Z.; Kessler, R.; Chen, H. Y.; ...
2017-03-01
The coalescence of a binary neutron star pair is expected to produce gravitational waves (GW) and electromagnetic radiation, both of which may be detectable with currently available instruments. In this paper, we describe a search for a predicted r-process optical transient from these mergers, dubbed the “kilonova” (KN), using griz broadband data from the Dark Energy Survey Supernova Program (DES-SN). Some models predict KNe to be redder, shorter-lived, and dimmer than supernovae (SNe), but the event rate of KNe is poorly constrained. We simulate KN and SN light curves with the Monte-Carlo simulation code SNANA to optimize selection requirements, determine search efficiency, and predict SN backgrounds. Our analysis of the first two seasons of DES-SN data results in 0 events, and is consistent with our prediction of 1.1 ± 0.2 background events based on simulations of SNe. From our prediction, there is a 33% chance of finding 0 events in the data. Assuming no underlying galaxy flux, our search sets 90% upper limits on the KN volumetric rate of 1.0 x 10more » $$^{7}$$ Gpc $-$3 yr $-$1 for the dimmest KN model we consider (peak i-band absolute magnitude $${M}_{i}=-11.4$$ mag) and 2.4x 10$$^{4}$$ Gpc $-$3 yr $-$1 for the brightest ($${M}_{i}=-16.2$$ mag). Accounting for anomalous subtraction artifacts on bright galaxies, these limits are ~3 times higher. This analysis is the first untriggered optical KN search and informs selection requirements and strategies for future KN searches. Finally, our upper limits on the KN rate are consistent with those measured by GW and gamma-ray burst searches.« less
A Search for Kilonovae in the Dark Energy Survey
NASA Astrophysics Data System (ADS)
Doctor, Z.; Kessler, R.; Chen, H. Y.; Farr, B.; Finley, D. A.; Foley, R. J.; Goldstein, D. A.; Holz, D. E.; Kim, A. G.; Morganson, E.; Sako, M.; Scolnic, D.; Smith, M.; Soares-Santos, M.; Spinka, H.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Annis, J.; Bechtol, K.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Cunha, C. E.; D'Andrea, C. B.; da Costa, L. N.; DePoy, D. L.; Desai, S.; Diehl, H. T.; Drlica-Wagner, A.; Eifler, T. F.; Frieman, J.; García-Bellido, J.; Gaztanaga, E.; Gerdes, D. W.; Gruendl, R. A.; Gschwend, J.; Gutierrez, G.; James, D. J.; Krause, E.; Kuehn, K.; Kuropatkin, N.; Lahav, O.; Li, T. S.; Lima, M.; Maia, M. A. G.; March, M.; Marshall, J. L.; Menanteau, F.; Miquel, R.; Neilsen, E.; Nichol, R. C.; Nord, B.; Plazas, A. A.; Romer, A. K.; Sanchez, E.; Scarpine, V.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, R. C.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Walker, A. R.; Wester, W.; DES Collaboration
2017-03-01
The coalescence of a binary neutron star pair is expected to produce gravitational waves (GW) and electromagnetic radiation, both of which may be detectable with currently available instruments. We describe a search for a predicted r-process optical transient from these mergers, dubbed the “kilonova” (KN), using griz broadband data from the Dark Energy Survey Supernova Program (DES-SN). Some models predict KNe to be redder, shorter-lived, and dimmer than supernovae (SNe), but the event rate of KNe is poorly constrained. We simulate KN and SN light curves with the Monte-Carlo simulation code SNANA to optimize selection requirements, determine search efficiency, and predict SN backgrounds. Our analysis of the first two seasons of DES-SN data results in 0 events, and is consistent with our prediction of 1.1 ± 0.2 background events based on simulations of SNe. From our prediction, there is a 33% chance of finding 0 events in the data. Assuming no underlying galaxy flux, our search sets 90% upper limits on the KN volumetric rate of 1.0 × {10}7 Gpc-3 yr-1 for the dimmest KN model we consider (peak I-band absolute magnitude {M}I=-11.4 mag) and 2.4 × {10}4 Gpc-3 yr-1 for the brightest ({M}I=-16.2 mag). Accounting for anomalous subtraction artifacts on bright galaxies, these limits are ˜3 times higher. This analysis is the first untriggered optical KN search and informs selection requirements and strategies for future KN searches. Our upper limits on the KN rate are consistent with those measured by GW and gamma-ray burst searches.
Keegan, Ronan M; Bibby, Jaclyn; Thomas, Jens; Xu, Dong; Zhang, Yang; Mayans, Olga; Winn, Martyn D; Rigden, Daniel J
2015-02-01
AMPLE clusters and truncates ab initio protein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the different ab initio modelling programs QUARK and ROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced by Phaser after only 5 min perform surprisingly well, improving the prospects for in situ structure solution by AMPLE during synchrotron visits. Taken together, the results show the potential for AMPLE to run more quickly and successfully solve more targets than previously suspected.
New knowledge-based genetic algorithm for excavator boom structural optimization
NASA Astrophysics Data System (ADS)
Hua, Haiyan; Lin, Shuwen
2014-03-01
Due to the insufficiency of utilizing knowledge to guide the complex optimal searching, existing genetic algorithms fail to effectively solve excavator boom structural optimization problem. To improve the optimization efficiency and quality, a new knowledge-based real-coded genetic algorithm is proposed. A dual evolution mechanism combining knowledge evolution with genetic algorithm is established to extract, handle and utilize the shallow and deep implicit constraint knowledge to guide the optimal searching of genetic algorithm circularly. Based on this dual evolution mechanism, knowledge evolution and population evolution can be connected by knowledge influence operators to improve the configurability of knowledge and genetic operators. Then, the new knowledge-based selection operator, crossover operator and mutation operator are proposed to integrate the optimal process knowledge and domain culture to guide the excavator boom structural optimization. Eight kinds of testing algorithms, which include different genetic operators, are taken as examples to solve the structural optimization of a medium-sized excavator boom. By comparing the results of optimization, it is shown that the algorithm including all the new knowledge-based genetic operators can more remarkably improve the evolutionary rate and searching ability than other testing algorithms, which demonstrates the effectiveness of knowledge for guiding optimal searching. The proposed knowledge-based genetic algorithm by combining multi-level knowledge evolution with numerical optimization provides a new effective method for solving the complex engineering optimization problem.
Austvoll-Dahlgren, Astrid; Falk, Ragnhild S; Helseth, Sølvi
2012-12-01
Peoples' ability to obtain health information is a precondition for their effective participation in decision making about health. However, there is limited evidence describing which cognitive factors can predict the intention of people to search for health information. To test the utility of a questionnaire in predicting intentions to search for health information, and to identify important predictors associated with this intention such that these could be targeted in an Intervention. A questionnaire was developed based on the Theory of Planned Behaviour and tested on both a mixed population sample (n=30) and a sample of parents (n = 45). The questionnaire was explored by testing for internal consistency, calculating inter-correlations between theoretically-related constructs, and by using multiple regression analysis. The reliability and validity of the questionnaire were found to be satisfactory and consistent across the two samples. The questionnaires' direct measures prediction of intention was high and accounted for 47% and 55% of the variance in behavioural intentions. Attitudes and perceived behavioural control were identified as important predictors to intention for search for health information. The questionnaire may be a useful tool for understanding and evaluating behavioural intentions and beliefs related to searches for health information. © 2012 The authors. Health Information and Libraries Journal © 2012 Health Libraries Group.
Genome-Wide SNP Genotyping to Infer the Effects on Gene Functions in Tomato
Hirakawa, Hideki; Shirasawa, Kenta; Ohyama, Akio; Fukuoka, Hiroyuki; Aoki, Koh; Rothan, Christophe; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi
2013-01-01
The genotype data of 7054 single nucleotide polymorphism (SNP) loci in 40 tomato lines, including inbred lines, F1 hybrids, and wild relatives, were collected using Illumina's Infinium and GoldenGate assay platforms, the latter of which was utilized in our previous study. The dendrogram based on the genotype data corresponded well to the breeding types of tomato and wild relatives. The SNPs were classified into six categories according to their positions in the genes predicted on the tomato genome sequence. The genes with SNPs were annotated by homology searches against the nucleotide and protein databases, as well as by domain searches, and they were classified into the functional categories defined by the NCBI's eukaryotic orthologous groups (KOG). To infer the SNPs' effects on the gene functions, the three-dimensional structures of the 843 proteins that were encoded by the genes with SNPs causing missense mutations were constructed by homology modelling, and 200 of these proteins were considered to carry non-synonymous amino acid substitutions in the predicted functional sites. The SNP information obtained in this study is available at the Kazusa Tomato Genomics Database (http://plant1.kazusa.or.jp/tomato/). PMID:23482505
Membrane Topology and Insertion of Membrane Proteins: Search for Topogenic Signals
van Geest, Marleen; Lolkema, Juke S.
2000-01-01
Integral membrane proteins are found in all cellular membranes and carry out many of the functions that are essential to life. The membrane-embedded domains of integral membrane proteins are structurally quite simple, allowing the use of various prediction methods and biochemical methods to obtain structural information about membrane proteins. A critical step in the biosynthetic pathway leading to the folded protein in the membrane is its insertion into the lipid bilayer. Understanding of the fundamentals of the insertion and folding processes will significantly improve the methods used to predict the three-dimensional membrane protein structure from the amino acid sequence. In the first part of this review, biochemical approaches to elucidate membrane protein topology are reviewed and evaluated, and in the second part, the use of similar techniques to study membrane protein insertion is discussed. The latter studies search for signals in the polypeptide chain that direct the insertion process. Knowledge of the topogenic signals in the nascent chain of a membrane protein is essential for the evaluation of membrane topology studies. PMID:10704472
SwiSpot: modeling riboswitches by spotting out switching sequences.
Barsacchi, Marco; Novoa, Eva Maria; Kellis, Manolis; Bechini, Alessio
2016-11-01
Riboswitches are cis-regulatory elements in mRNA, mostly found in Bacteria, which exhibit two main secondary structure conformations. Although one of them prevents the gene from being expressed, the other conformation allows its expression, and this switching process is typically driven by the presence of a specific ligand. Although there are a handful of known riboswitches, our knowledge in this field has been greatly limited due to our inability to identify their alternate structures from their sequences. Indeed, current methods are not able to predict the presence of the two functionally distinct conformations just from the knowledge of the plain RNA nucleotide sequence. Whether this would be possible, for which cases, and what prediction accuracy can be achieved, are currently open questions. Here we show that the two alternate secondary structures of riboswitches can be accurately predicted once the 'switching sequence' of the riboswitch has been properly identified. The proposed SwiSpot approach is capable of identifying the switching sequence inside a putative, complete riboswitch sequence, on the basis of pairing behaviors, which are evaluated on proper sets of configurations. Moreover, it is able to model the switching behavior of riboswitches whose generated ensemble covers both alternate configurations. Beyond structural predictions, the approach can also be paired to homology-based riboswitch searches. SwiSpot software, along with the reference dataset files, is available at: http://www.iet.unipi.it/a.bechini/swispot/Supplementary information: Supplementary data are available at Bioinformatics online. a.bechini@ing.unipi.it. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Mlinsek, G; Novic, M; Hodoscek, M; Solmajer, T
2001-01-01
Thrombin is a serine protease which plays important roles in the human body, the key one being the control of thrombus formation. The inhibition of thrombin has become a target for new antithrombotics. The aim of our work was to (i) construct a model which would enable us to predict Ki values for the binding of an inhibitor into the active site of thrombin based on a database of known X-ray structures of inhibitor-enzyme complexes and (ii) to identify the structural and electrostatic characteristics of inhibitor molecules crucially important to their effective binding. To retain as much of the 3D structural information of the bound inhibitor as possible, we implemented the quantum mechanical/molecular mechanical (QM/MM) procedure for calculating the molecular electrostatic potential (MEP) at the van der Waals surfaces of atoms in the protein's active site. The inhibitor was treated quantum mechanically, while the rest of the complex was treated by classical means. The obtained MEP values served as inputs into the counter-propagation artificial neural network (CP-ANN), and a genetic algorithm was subsequently used to search for the combination of atoms that predominantly influences the binding. The constructed CP-ANN model yielded Ki values predictions with a correlation coefficient of 0.96, with Ki values extended over 7 orders of magnitude. Our approach also shows the relative importance of the various amino acid residues present in the active site of the enzyme for inhibitor binding. The list of residues selected by our automatic procedure is in good correlation with the current consensus regarding the importance of certain crucial residues in thrombin's active site.
Morris, Garrett M; Lim-Wilby, Marguerita
2008-01-01
Molecular docking is a key tool in structural molecular biology and computer-assisted drug design. The goal of ligand-protein docking is to predict the predominant binding mode(s) of a ligand with a protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that correctly ranks candidate dockings. Docking can be used to perform virtual screening on large libraries of compounds, rank the results, and propose structural hypotheses of how the ligands inhibit the target, which is invaluable in lead optimization. The setting up of the input structures for the docking is just as important as the docking itself, and analyzing the results of stochastic search methods can sometimes be unclear. This chapter discusses the background and theory of molecular docking software, and covers the usage of some of the most-cited docking software.
Consumer Search, Rationing Rules, and the Consequence for Competition
NASA Astrophysics Data System (ADS)
Ruebeck, Christopher S.
Firms' conjectures about demand are consequential in oligopoly games. Through agent-based modeling of consumers' search for products, we can study the rationing of demand between capacity-constrained firms offering homogeneous products and explore the robustness of analytically solvable models' results. After algorithmically formalizing short-run search behavior rather than assuming a long-run average, this study predicts stronger competition in a two-stage capacity-price game.
A rate-constrained fast full-search algorithm based on block sum pyramid.
Song, Byung Cheol; Chun, Kang-Wook; Ra, Jong Beom
2005-03-01
This paper presents a fast full-search algorithm (FSA) for rate-constrained motion estimation. The proposed algorithm, which is based on the block sum pyramid frame structure, successively eliminates unnecessary search positions according to rate-constrained criterion. This algorithm provides the identical estimation performance to a conventional FSA having rate constraint, while achieving considerable reduction in computation.
Simulations of Scatterometry Down to 22 nm Structure Sizes and Beyond with Special Emphasis on LER
NASA Astrophysics Data System (ADS)
Osten, W.; Ferreras Paz, V.; Frenner, K.; Schuster, T.; Bloess, H.
2009-09-01
In recent years, scatterometry has become one of the most commonly used methods for CD metrology. With decreasing structure size for future technology nodes, the search for optimized scatterometry measurement configurations gets more important to exploit maximum sensitivity. As widespread industrial scatterometry tools mainly still use a pre-set measurement configuration, there are still free parameters to improve sensitivity. Our current work uses a simulation based approach to predict and optimize sensitivity of future technology nodes. Since line edge roughness is getting important for such small structures, these imperfections of the periodic continuation cannot be neglected. Using fourier methods like e.g. rigorous coupled wave approach (RCWA) for diffraction calculus, nonperiodic features are hard to reach. We show that in this field certain types of fieldstitching methods show nice numerical behaviour and lead to useful results.
NASA Astrophysics Data System (ADS)
Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.
2017-12-01
Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Why relevant chemical information cannot be exchanged without disclosing structures
NASA Astrophysics Data System (ADS)
Filimonov, Dmitry; Poroikov, Vladimir
2005-09-01
Both society and industry are interested in increasing the safety of pharmaceuticals. Potentially dangerous compounds could be filtered out at early stages of R&D by computer prediction of biological activity and ADMET characteristics. Accuracy of such predictions strongly depends on the quality & quantity of information contained in a training set. Suggestion that some relevant chemical information can be added to such training sets without disclosing chemical structures was generated at the recent ACS Symposium. We presented arguments that such safety exchange of relevant chemical information is impossible. Any relevant information about chemical structures can be used for search of either a particular compound itself or its close analogues. Risk of identifying such structures is enough to prevent pharma industry from relevant chemical information exchange.
Hao, Ge-Fei; Xu, Wei-Fang; Yang, Sheng-Gang; Yang, Guang-Fu
2015-01-01
Protein and peptide structure predictions are of paramount importance for understanding their functions, as well as the interactions with other molecules. However, the use of molecular simulation techniques to directly predict the peptide structure from the primary amino acid sequence is always hindered by the rough topology of the conformational space and the limited simulation time scale. We developed here a new strategy, named Multiple Simulated Annealing-Molecular Dynamics (MSA-MD) to identify the native states of a peptide and miniprotein. A cluster of near native structures could be obtained by using the MSA-MD method, which turned out to be significantly more efficient in reaching the native structure compared to continuous MD and conventional SA-MD simulation. PMID:26492886
Kim, J; Lee, C; Chong, Y
2009-01-01
Influenza endonucleases have appeared as an attractive target of antiviral therapy for influenza infection. With the purpose of designing a novel antiviral agent with enhanced biological activities against influenza endonuclease, a three-dimensional quantitative structure-activity relationships (3D-QSAR) model was generated based on 34 influenza endonuclease inhibitors. The comparative molecular similarity index analysis (CoMSIA) with a steric, electrostatic and hydrophobic (SEH) model showed the best correlative and predictive capability (q(2) = 0.763, r(2) = 0.969 and F = 174.785), which provided a pharmacophore composed of the electronegative moiety as well as the bulky hydrophobic group. The CoMSIA model was used as a pharmacophore query in the UNITY search of the ChemDiv compound library to give virtual active compounds. The 3D-QSAR model was then used to predict the activity of the selected compounds, which identified three compounds as the most likely inhibitor candidates.
Chen, Shangying; Zhang, Peng; Liu, Xin; Qin, Chu; Tao, Lin; Zhang, Cheng; Yang, Sheng Yong; Chen, Yu Zong; Chui, Wai Keung
2016-06-01
The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. Copyright © 2016. Published by Elsevier Inc.
An AI-based approach to structural damage identification by modal analysis
NASA Technical Reports Server (NTRS)
Glass, B. J.; Hanagud, S.
1990-01-01
Flexible-structure damage is presently addressed by a combined model- and parameter-identification approach which employs the AI methodologies of classification, heuristic search, and object-oriented model knowledge representation. The conditions for model-space search convergence to the best model are discussed in terms of search-tree organization and initial model parameter error. In the illustrative example of a truss structure presented, the use of both model and parameter identification is shown to lead to smaller parameter corrections than would be required by parameter identification alone.
Robust object tacking based on self-adaptive search area
NASA Astrophysics Data System (ADS)
Dong, Taihang; Zhong, Sheng
2018-02-01
Discriminative correlation filter (DCF) based trackers have recently achieved excellent performance with great computational efficiency. However, DCF based trackers suffer boundary effects, which result in the unstable performance in challenging situations exhibiting fast motion. In this paper, we propose a novel method to mitigate this side-effect in DCF based trackers. We change the search area according to the prediction of target motion. When the object moves fast, broad search area could alleviate boundary effects and reserve the probability of locating object. When the object moves slowly, narrow search area could prevent effect of useless background information and improve computational efficiency to attain real-time performance. This strategy can impressively soothe boundary effects in situations exhibiting fast motion and motion blur, and it can be used in almost all DCF based trackers. The experiments on OTB benchmark show that the proposed framework improves the performance compared with the baseline trackers.
ChemProt-2.0: visual navigation in a disease chemical biology database
Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier
2013-01-01
ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041
the NDB archive or in the Non-Redundant list Advanced Search Search for structures based on structural features, chemical features, binding modes, citation and experimental information Featured Tools RNA 3D Motif Atlas, a representative collection of RNA 3D internal and hairpin loop motifs Non-redundant Lists
MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data
Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka
2014-01-01
The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package “MEGADOCK” that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673
Modeling and measurements of XRD spectra of extended solids under high pressure
NASA Astrophysics Data System (ADS)
Batyrev, I. G.; Coleman, S. P.; Stavrou, E.; Zaug, J. M.; Ciezak-Jenkins, J. A.
2017-06-01
We present results of evolutionary simulations based on density functional calculations of various extended solids: N-Si and N-H using variable and fixed concentration methods of USPEX. Predicted from the evolutionary simulations structures were analyzed in terms of thermo-dynamical stability and agreement with experimental X-ray diffraction spectra. Stability of the predicted system was estimated from convex-hull plots. X-ray diffraction spectra were calculated using a virtual diffraction algorithm which computes kinematic diffraction intensity in three-dimensional reciprocal space before being reduced to a two-theta line profile. Calculations of thousands of XRD spectra were used to search for a structure of extended solids at certain pressures with best fits to experimental data according to experimental XRD peak position, peak intensity and theoretically calculated enthalpy. Comparison of Raman and IR spectra calculated for best fitted structures with available experimental data shows reasonable agreement for certain vibration modes. Part of this work was performed by LLNL, Contract DE-AC52-07NA27344. We thank the Joint DoD / DOE Munitions Technology Development Program, the HE C-II research program at LLNL and Advanced Light Source, supported by BES DOE, Contract No. DE-AC02-05CH112.
Patient-completed or symptom-based screening tools for endometriosis: a scoping review.
Surrey, Eric; Carter, Cathryn M; Soliman, Ahmed M; Khan, Shahnaz; DiBenedetti, Dana B; Snabes, Michael C
2017-08-01
The objective of this review was to evaluate existing patient-completed screening questionnaires and/or symptom-based predictive models with respect to their potential for use as screening tools for endometriosis in adult women. Validated instruments were of particular interest. We conducted structured searches of PubMed and targeted searches of the gray literature to identify studies reporting on screening instruments used in endometriosis. Studies were screened according to inclusion and exclusion criteria that followed the PICOS (population, intervention, comparison, outcomes, study design) framework. A total of 16 studies were identified, of which 10 described measures for endometriosis in general, 2 described measures for endometriosis at specific sites, and 4 described measures for deep-infiltrating endometriosis. Only 1 study evaluated a questionnaire that was solely patient-completed. Most measures required physician, imaging, or laboratory assessments in addition to patient-completed questionnaires, and several measures relied on complex scoring. Validation for use as a screening tool in adult women with potential endometriosis was lacking in all studies, as most studies focused on diagnosis versus screening. This literature review did not identify any fully validated, symptom-based, patient-reported questionnaires for endometriosis screening in adult women.
Composite charge 8/3 resonances at the LHC
NASA Astrophysics Data System (ADS)
Matsedonskyi, Oleksii; Riva, Francesco; Vantalon, Thibaud
2014-04-01
In composite Higgs models with partial compositeness, the small value of the observed Higgs mass implies the existence of light fermionic resonances, the top partners, whose quantum numbers are determined by the symmetry (and symmetry breaking) structure of the theory. Here we study light top partners with electric charge 8/3, which are predicted, for instance, in some of the most natural composite Higgs realizations. We recast data from two same sign lepton searches and from searches for microscopic blackholes into a bound on its mass, M 8/3 > 940 GeV. Furthermore, we compare potential reach of these searches with a specifically designed search for three same-sign leptons, both at 8 and 14TeV. We provide a simplified model, suitable for collider analysis.
Lustgarten, Jonathan Lyle; Balasubramanian, Jeya Balaji; Visweswaran, Shyam; Gopalakrishnan, Vanathi
2017-03-01
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.
Using the Baidu Search Index to Predict the Incidence of HIV/AIDS in China.
He, Guangye; Chen, Yunsong; Chen, Buwei; Wang, Hao; Shen, Li; Liu, Liu; Suolang, Deji; Zhang, Boyang; Ju, Guodong; Zhang, Liangliang; Du, Sijia; Jiang, Xiangxue; Pan, Yu; Min, Zuntao
2018-06-13
Based on a panel of 30 provinces and a timeframe from January 2009 to December 2013, we estimate the association between monthly human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) incidence and the relevant Internet search query volumes in Baidu, the most widely used search engine among the Chinese. The pooled mean group (PMG) model show that the Baidu search index (BSI) positively predicts the increase in HIV/AIDS incidence, with a 1% increase in BSI associated with a 2.1% increase in HIV/AIDS incidence on average. This study proposes a promising method to estimate and forecast the incidence of HIV/AIDS, a type of infectious disease that is culturally sensitive and highly unevenly distributed in China; the method can be taken as a complement to a traditional HIV/AIDS surveillance system.
Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia
2016-01-01
Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs. PMID:27929431
Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia
2016-12-06
Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs.
Rigden, Daniel J; Thomas, Jens M H; Simkovic, Felix; Simpkin, Adam; Winn, Martyn D; Mayans, Olga; Keegan, Ronan M
2018-03-01
Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Although routine in many cases, it becomes more effortful and often impossible when the available experimental structures typically used as search models are only distantly homologous to the target. Nevertheless, with current powerful MR software, relatively small core structures shared between the target and known structure, of 20-40% of the overall structure for example, can succeed as search models where they can be isolated. Manual sculpting of such small structural cores is rarely attempted and is dependent on the crystallographer's expertise and understanding of the protein family in question. Automated search-model editing has previously been performed on the basis of sequence alignment, in order to eliminate, for example, side chains or loops that are not present in the target, or on the basis of structural features (e.g. solvent accessibility) or crystallographic parameters (e.g. B factors). Here, based on recent work demonstrating a correlation between evolutionary conservation and protein rigidity/packing, novel automated ways to derive edited search models from a given distant homologue over a range of sizes are presented. A variety of structure-based metrics, many readily obtained from online webservers, can be fed to the MR pipeline AMPLE to produce search models that succeed with a set of test cases where expertly manually edited comparators, further processed in diverse ways with MrBUMP, fail. Further significant performance gains result when the structure-based distance geometry method CONCOORD is used to generate ensembles from the distant homologue. To our knowledge, this is the first such approach whereby a single structure is meaningfully transformed into an ensemble for the purposes of MR. Additional cases further demonstrate the advantages of the approach. CONCOORD is freely available and computationally inexpensive, so these novel methods offer readily available new routes to solve difficult MR cases.
Simpkin, Adam; Mayans, Olga; Keegan, Ronan M.
2018-01-01
Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Although routine in many cases, it becomes more effortful and often impossible when the available experimental structures typically used as search models are only distantly homologous to the target. Nevertheless, with current powerful MR software, relatively small core structures shared between the target and known structure, of 20–40% of the overall structure for example, can succeed as search models where they can be isolated. Manual sculpting of such small structural cores is rarely attempted and is dependent on the crystallographer’s expertise and understanding of the protein family in question. Automated search-model editing has previously been performed on the basis of sequence alignment, in order to eliminate, for example, side chains or loops that are not present in the target, or on the basis of structural features (e.g. solvent accessibility) or crystallographic parameters (e.g. B factors). Here, based on recent work demonstrating a correlation between evolutionary conservation and protein rigidity/packing, novel automated ways to derive edited search models from a given distant homologue over a range of sizes are presented. A variety of structure-based metrics, many readily obtained from online webservers, can be fed to the MR pipeline AMPLE to produce search models that succeed with a set of test cases where expertly manually edited comparators, further processed in diverse ways with MrBUMP, fail. Further significant performance gains result when the structure-based distance geometry method CONCOORD is used to generate ensembles from the distant homologue. To our knowledge, this is the first such approach whereby a single structure is meaningfully transformed into an ensemble for the purposes of MR. Additional cases further demonstrate the advantages of the approach. CONCOORD is freely available and computationally inexpensive, so these novel methods offer readily available new routes to solve difficult MR cases. PMID:29533226
Analytic Guided-Search Model of Human Performance Accuracy in Target- Localization Search Tasks
NASA Technical Reports Server (NTRS)
Eckstein, Miguel P.; Beutter, Brent R.; Stone, Leland S.
2000-01-01
Current models of human visual search have extended the traditional serial/parallel search dichotomy. Two successful models for predicting human visual search are the Guided Search model and the Signal Detection Theory model. Although these models are inherently different, it has been difficult to compare them because the Guided Search model is designed to predict response time, while Signal Detection Theory models are designed to predict performance accuracy. Moreover, current implementations of the Guided Search model require the use of Monte-Carlo simulations, a method that makes fitting the model's performance quantitatively to human data more computationally time consuming. We have extended the Guided Search model to predict human accuracy in target-localization search tasks. We have also developed analytic expressions that simplify simulation of the model to the evaluation of a small set of equations using only three free parameters. This new implementation and extension of the Guided Search model will enable direct quantitative comparisons with human performance in target-localization search experiments and with the predictions of Signal Detection Theory and other search accuracy models.
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
Testing process predictions of models of risky choice: a quantitative model comparison approach
Pachur, Thorsten; Hertwig, Ralph; Gigerenzer, Gerd; Brandstätter, Eduard
2013-01-01
This article presents a quantitative model comparison contrasting the process predictions of two prominent views on risky choice. One view assumes a trade-off between probabilities and outcomes (or non-linear functions thereof) and the separate evaluation of risky options (expectation models). Another view assumes that risky choice is based on comparative evaluation, limited search, aspiration levels, and the forgoing of trade-offs (heuristic models). We derived quantitative process predictions for a generic expectation model and for a specific heuristic model, namely the priority heuristic (Brandstätter et al., 2006), and tested them in two experiments. The focus was on two key features of the cognitive process: acquisition frequencies (i.e., how frequently individual reasons are looked up) and direction of search (i.e., gamble-wise vs. reason-wise). In Experiment 1, the priority heuristic predicted direction of search better than the expectation model (although neither model predicted the acquisition process perfectly); acquisition frequencies, however, were inconsistent with both models. Additional analyses revealed that these frequencies were primarily a function of what Rubinstein (1988) called “similarity.” In Experiment 2, the quantitative model comparison approach showed that people seemed to rely more on the priority heuristic in difficult problems, but to make more trade-offs in easy problems. This finding suggests that risky choice may be based on a mental toolbox of strategies. PMID:24151472
Interactive computer programs for the graphic analysis of nucleotide sequence data.
Luckow, V A; Littlewood, R K; Rownd, R H
1984-01-01
A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437
Ensemble-based prediction of RNA secondary structures.
Aghaeepour, Nima; Hoos, Holger H
2013-04-24
Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
Rollins, Brent L; Ramakrishnan, Shravanan; Perri, Matthew
2014-01-01
Direct-to-consumer (DTC) advertising of predictive genetic tests (PGTs) has added a new dimension to health advertising. This study used an online survey based on the health belief model framework to examine and more fully understand consumers' responses and behavioral intentions in response to a PGT DTC advertisement. Overall, consumers reported moderate intentions to talk with their doctor and seek more information about PGTs after advertisement exposure, though consumers did not seem ready to take the advertised test or engage in active information search. Those who perceived greater threat from the disease, however, had significantly greater behavioral intentions and information search behavior.
Review of electronic transport models for thermoelectric materials
NASA Astrophysics Data System (ADS)
Bulusu, A.; Walker, D. G.
2008-07-01
Thermoelectric devices have gained importance in recent years as viable solutions for applications such as spot cooling of electronic components, remote power generation in space stations and satellites etc. These solid-state devices have long been known for their reliability rather than their efficiency; they contain no moving parts, and their performance relies primarily on material selection, which has not generated many excellent candidates. Research in recent years has been focused on developing both thermoelectric structures and materials that have high efficiency. In general, thermoelectric research is two-pronged with (1) experiments focused on finding new materials and structures with enhanced thermoelectric performance and (2) analytical models that predict thermoelectric behavior to enable better design and optimization of materials and structures. While numerous reviews have discussed the importance of and dependence on materials for thermoelectric performance, an overview of how to predict the performance of various materials and structures based on fundamental quantities is lacking. In this paper we present a review of the theoretical models that were developed since thermoelectricity was first observed in 1821 by Seebeck and how these models have guided experimental material search for improved thermoelectric devices. A new quantum model is also presented, which provides opportunities for the optimization of nanoscale materials to enhance thermoelectric performance.
NASA Astrophysics Data System (ADS)
Knispel, Benjamin
2011-07-01
Neutron stars are the endpoints of stellar evolution and one of the most compact forms of matter in the universe. They can be observed as radio pulsars and are promising sources for the emission of continuous gravitational waves. Discovering new radio pulsars in tight binary orbits offers the opportunity to conduct very high precision tests of General Relativity and to further our understanding of neutron star structure and matter at super-nuclear densities. The direct detection of gravitational waves would validate Einstein's theory of Relativity and open a new window to the universe by offering a novel astronomical tool. This thesis addresses both of these scientific fields: the first fully coherent search for radio pulsars in tight, circular orbits has been planned, set up and conducted in the course of this thesis. Two unusual radio pulsars, one of them in a binary system, have been discovered. The other half of this thesis is concerned with the simulation of the Galactic neutron star population to predict their emission of continuous gravitational waves. First realistic statistical upper limits on the strongest continuous gravitational-wave signal and detection predictions for realistic all-sky blind searches have been obtained. The data from a large-scale pulsar survey with the 305-m Arecibo radio telescope were searched for signals from radio pulsars in binary orbits. The massive amount of computational work was done on hundreds of thousands of computers volunteered by members of the general public through the distributed computing project Einstein@Home. The newly developed analysis pipeline searched for pulsar spin frequencies below 250 Hz and for orbital periods as short as 11 min. The structure of the search pipeline consisting of data preparation, data analysis, result post-processing, and set-up of the pipeline components is presented in detail. The first radio pulsar, discovered with this search, PSR J2007+2722, is an isolated radio pulsar, likely from a double neutron star system disrupted by the second supernova. We present discovery and initial characterisation using observations from five of the largest radio telescopes worldwide. Only a dozen similar systems were previously known. The second discovered radio pulsar, PSR J1952+2630, is in a 9.4-hr orbit with most likely a massive white dwarf of at least 0.95 M⊙. We characterise its orbit by analysis of the apparent spin period changes. This pulsar most likely belongs to the very rare class of intermediate-mass binary pulsars, from which only five systems were previously known. It is a promising target for the future measurement of relativistic effects. In the second half of this thesis, the emission of continuous gravitational waves from a Galactic population of neutron stars is studied. For the first time, realistic estimates of the statistical upper limit of the expected gravitational wave signal are obtained, improving previous estimates by about a factor of six. The simulation is used to obtain for the first time detectability predictions for these objects with ground based gravitational wave detectors and realistic blind searches. It is also shown how to improve possible searches by maximising the number of detections for a fixed amount of computation cycles.
Efficient embedding of complex networks to hyperbolic space via their Laplacian
Alanis-Lobato, Gregorio; Mier, Pablo; Andrade-Navarro, Miguel A.
2016-01-01
The different factors involved in the growth process of complex networks imprint valuable information in their observable topologies. How to exploit this information to accurately predict structural network changes is the subject of active research. A recent model of network growth sustains that the emergence of properties common to most complex systems is the result of certain trade-offs between node birth-time and similarity. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract this optimisation process. Current methods for network hyperbolic embedding search for node coordinates that maximise the likelihood that the network was produced by the afore-mentioned model. Here, a different strategy is followed in the form of the Laplacian-based Network Embedding, a simple yet accurate, efficient and data driven manifold learning approach, which allows for the quick geometric analysis of big networks. Comparisons against existing embedding and prediction techniques highlight its applicability to network evolution and link prediction. PMID:27445157
Efficient embedding of complex networks to hyperbolic space via their Laplacian
NASA Astrophysics Data System (ADS)
Alanis-Lobato, Gregorio; Mier, Pablo; Andrade-Navarro, Miguel A.
2016-07-01
The different factors involved in the growth process of complex networks imprint valuable information in their observable topologies. How to exploit this information to accurately predict structural network changes is the subject of active research. A recent model of network growth sustains that the emergence of properties common to most complex systems is the result of certain trade-offs between node birth-time and similarity. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract this optimisation process. Current methods for network hyperbolic embedding search for node coordinates that maximise the likelihood that the network was produced by the afore-mentioned model. Here, a different strategy is followed in the form of the Laplacian-based Network Embedding, a simple yet accurate, efficient and data driven manifold learning approach, which allows for the quick geometric analysis of big networks. Comparisons against existing embedding and prediction techniques highlight its applicability to network evolution and link prediction.
Identifying Novel Molecular Structures for Advanced Melanoma by Ligand-Based Virtual Screening
Wang, Zhao; Lu, Yan; Seibel, William; Miller, Duane D.; Li, Wei
2009-01-01
We recently discovered a new class of thiazole analogs that are highly potent against melanoma cells. To expand the structure-activity relationship study and to explore potential new molecular scaffolds, we performed extensive ligand-based virtual screening against a compound library containing 342,910 small molecules. Two different approaches of virtual screening were carried out using the structure of our lead molecule: 1) connectivity-based search using Scitegic Pipeline Pilot from Accelerys and 2) molecular shape similarity search using Schrodinger software. Using a testing compound library, both approaches can rank similar compounds very high and rank dissimilar compounds very low, thus validating our screening methods. Structures identified from these searches were analyzed, and selected compounds were tested in vitro to assess their activity against melanoma cancer cell lines. Several molecules showed good anticancer activity. While none of the identified compounds showed better activity than our lead compound, they provided important insight into structural modifications for our lead compound and also provided novel platforms on which we can optimize new classes of anticancer compounds. One of the newly synthesized analogs based on this virtual screening has improved potency and selectivity against melanoma. PMID:19445498
NASA Astrophysics Data System (ADS)
Guo, Ning; Yang, Zhichun; Wang, Le; Ouyang, Yan; Zhang, Xinping
2018-05-01
Aiming at providing a precise dynamic structural finite element (FE) model for dynamic strength evaluation in addition to dynamic analysis. A dynamic FE model updating method is presented to correct the uncertain parameters of the FE model of a structure using strain mode shapes and natural frequencies. The strain mode shape, which is sensitive to local changes in structure, is used instead of the displacement mode for enhancing model updating. The coordinate strain modal assurance criterion is developed to evaluate the correlation level at each coordinate over the experimental and the analytical strain mode shapes. Moreover, the natural frequencies which provide the global information of the structure are used to guarantee the accuracy of modal properties of the global model. Then, the weighted summation of the natural frequency residual and the coordinate strain modal assurance criterion residual is used as the objective function in the proposed dynamic FE model updating procedure. The hybrid genetic/pattern-search optimization algorithm is adopted to perform the dynamic FE model updating procedure. Numerical simulation and model updating experiment for a clamped-clamped beam are performed to validate the feasibility and effectiveness of the present method. The results show that the proposed method can be used to update the uncertain parameters with good robustness. And the updated dynamic FE model of the beam structure, which can correctly predict both the natural frequencies and the local dynamic strains, is reliable for the following dynamic analysis and dynamic strength evaluation.
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs
2017-01-01
Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
Introduction to Psychology Students' Parental Status Predicts Learning Preferences and Life Meaning
ERIC Educational Resources Information Center
Lovell, Elyse D'nn; Munn, Nathan
2017-01-01
This study explores Introduction to Psychology students' learning preferences and their personal search for meaning while considering their parental status. The findings suggest that parents show preferences for project-based learning and have lower levels of searching for meaning than non-parents. When parental status, age, and finances were…
Lunar surface gravimeter experiment
NASA Technical Reports Server (NTRS)
Giganti, J. J.; Larson, J. V.; Richard, J. P.; Tobias, R. L.; Weber, J.
1977-01-01
The lunar surface gravimeter used the moon as an instrumented antenna to search for gravitational waves predicted by Einstein's general theory of relativity. Tidal deformation of the moon was measured. Gravitational radiation is a channel that is capable of giving information about the structure and evolution of the universe.
A continuously growing web-based interface structure databank
NASA Astrophysics Data System (ADS)
Erwin, N. A.; Wang, E. I.; Osysko, A.; Warner, D. H.
2012-07-01
The macroscopic properties of materials can be significantly influenced by the presence of microscopic interfaces. The complexity of these interfaces coupled with the vast configurational space in which they reside has been a long-standing obstacle to the advancement of true bottom-up material behavior predictions. In this vein, atomistic simulations have proven to be a valuable tool for investigating interface behavior. However, before atomistic simulations can be utilized to model interface behavior, meaningful interface atomic structures must be generated. The generation of structures has historically been carried out disjointly by individual research groups, and thus, has constituted an overlap in effort across the broad research community. To address this overlap and to lower the barrier for new researchers to explore interface modeling, we introduce a web-based interface structure databank (www.isdb.cee.cornell.edu) where users can search, download and share interface structures. The databank is intended to grow via two mechanisms: (1) interface structure donations from individual research groups and (2) an automated structure generation algorithm which continuously creates equilibrium interface structures. In this paper, we describe the databank, the automated interface generation algorithm, and compare a subset of the autonomously generated structures to structures currently available in the literature. To date, the automated generation algorithm has been directed toward aluminum grain boundary structures, which can be compared with experimentally measured population densities of aluminum polycrystals.
2013-01-01
Background Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface. PMID:24564970
Keegan, Ronan M.; Bibby, Jaclyn; Thomas, Jens; Xu, Dong; Zhang, Yang; Mayans, Olga; Winn, Martyn D.; Rigden, Daniel J.
2015-01-01
AMPLE clusters and truncates ab initio protein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the different ab initio modelling programs QUARK and ROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced by Phaser after only 5 min perform surprisingly well, improving the prospects for in situ structure solution by AMPLE during synchrotron visits. Taken together, the results show the potential for AMPLE to run more quickly and successfully solve more targets than previously suspected. PMID:25664744
Learning and cognitive styles in web-based learning: theory, evidence, and application.
Cook, David A
2005-03-01
Cognitive and learning styles (CLS) have long been investigated as a basis to adapt instruction and enhance learning. Web-based learning (WBL) can reach large, heterogenous audiences, and adaptation to CLS may increase its effectiveness. Adaptation is only useful if some learners (with a defined trait) do better with one method and other learners (with a complementary trait) do better with another method (aptitude-treatment interaction). A comprehensive search of health professions education literature found 12 articles on CLS in computer-assisted learning and WBL. Because so few reports were found, research from non-medical education was also included. Among all the reports, four CLS predominated. Each CLS construct was used to predict relationships between CLS and WBL. Evidence was then reviewed to support or refute these predictions. The wholist-analytic construct shows consistent aptitude-treatment interactions consonant with predictions (wholists need structure, a broad-before-deep approach, and social interaction, while analytics need less structure and a deep-before-broad approach). Limited evidence for the active-reflective construct suggests aptitude-treatment interaction, with active learners doing better with interactive learning and reflective learners doing better with methods to promote reflection. As predicted, no consistent interaction between the concrete-abstract construct and computer format was found, but one study suggests that there is interaction with instructional method. Contrary to predictions, no interaction was found for the verbal-imager construct. Teachers developing WBL activities should consider assessing and adapting to accommodate learners defined by the wholist-analytic and active-reflective constructs. Other adaptations should be considered experimental. Further WBL research could clarify the feasibility and effectiveness of assessing and adapting to CLS.
AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.
Fang, Jiansong; Wang, Ling; Li, Yecheng; Lian, Wenwen; Pang, Xiaocong; Wang, Hong; Yuan, Dongsheng; Wang, Qi; Liu, Ai-Lin; Du, Guan-Hua
2017-01-01
Alzheimer's disease (AD) is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs) to delay disease progression. The in silico prediction of chemical-protein interactions (CPI) can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR) method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC) for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.
High-throughput search of ternary chalcogenides for p-type transparent electrodes
Shi, Jingming; Cerqueira, Tiago F. T.; Cui, Wenwen; Nogueira, Fernando; Botti, Silvana; Marques, Miguel A. L.
2017-01-01
Delafossite crystals are fascinating ternary oxides that have demonstrated transparent conductivity and ambipolar doping. Here we use a high-throughput approach based on density functional theory to find delafossite and related layered phases of composition ABX2, where A and B are elements of the periodic table, and X is a chalcogen (O, S, Se, and Te). From the 15 624 compounds studied in the trigonal delafossite prototype structure, 285 are within 50 meV/atom from the convex hull of stability. These compounds are further investigated using global structural prediction methods to obtain their lowest-energy crystal structure. We find 79 systems not present in the materials project database that are thermodynamically stable and crystallize in the delafossite or in closely related structures. These novel phases are then characterized by calculating their band gaps and hole effective masses. This characterization unveils a large diversity of properties, ranging from normal metals, magnetic metals, and some candidate compounds for p-type transparent electrodes. PMID:28266587
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
An approach in building a chemical compound search engine in oracle database.
Wang, H; Volarath, P; Harrison, R
2005-01-01
A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
Jones, Josette; Harris, Marcelline; Bagley-Thompson, Cheryl; Root, Jane
2003-01-01
This poster describes the development of user-centered interfaces in order to extend the functionality of the Virginia Henderson International Nursing Library (VHINL) from library to web based portal to nursing knowledge resources. The existing knowledge structure and computational models are revised and made complementary. Nurses' search behavior is captured and analyzed, and the resulting search models are mapped to the revised knowledge structure and computational model.
Lesion search and recognition by thymine DNA glycosylase revealed by single molecule imaging
Buechner, Claudia N.; Maiti, Atanu; Drohat, Alexander C.; Tessmer, Ingrid
2015-01-01
The ability of DNA glycosylases to rapidly and efficiently detect lesions among a vast excess of nondamaged DNA bases is vitally important in base excision repair (BER). Here, we use single molecule imaging by atomic force microscopy (AFM) supported by a 2-aminopurine fluorescence base flipping assay to study damage search by human thymine DNA glycosylase (hTDG), which initiates BER of mutagenic and cytotoxic G:T and G:U mispairs in DNA. Our data reveal an equilibrium between two conformational states of hTDG–DNA complexes, assigned as search complex (SC) and interrogation complex (IC), both at target lesions and undamaged DNA sites. Notably, for both hTDG and a second glycosylase, hOGG1, which recognizes structurally different 8-oxoguanine lesions, the conformation of the DNA in the SC mirrors innate structural properties of their respective target sites. In the IC, the DNA is sharply bent, as seen in crystal structures of hTDG lesion recognition complexes, which likely supports the base flipping required for lesion identification. Our results support a potentially general concept of sculpting of glycosylases to their targets, allowing them to exploit the energetic cost of DNA bending for initial lesion sensing, coupled with continuous (extrahelical) base interrogation during lesion search by DNA glycosylases. PMID:25712093
pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning.
Zhou, Xie-Xuan; Zeng, Wen-Feng; Chi, Hao; Luo, Chunjie; Liu, Chao; Zhan, Jianfeng; He, Si-Min; Zhang, Zhifei
2017-12-05
In tandem mass spectrometry (MS/MS)-based proteomics, search engines rely on comparison between an experimental MS/MS spectrum and the theoretical spectra of the candidate peptides. Hence, accurate prediction of the theoretical spectra of peptides appears to be particularly important. Here, we present pDeep, a deep neural network-based model for the spectrum prediction of peptides. Using the bidirectional long short-term memory (BiLSTM), pDeep can predict higher-energy collisional dissociation, electron-transfer dissociation, and electron-transfer and higher-energy collision dissociation MS/MS spectra of peptides with >0.9 median Pearson correlation coefficients. Further, we showed that intermediate layer of the neural network could reveal physicochemical properties of amino acids, for example the similarities of fragmentation behaviors between amino acids. We also showed the potential of pDeep to distinguish extremely similar peptides (peptides that contain isobaric amino acids, for example, GG = N, AG = Q, or even I = L), which were very difficult to distinguish using traditional search engines.
DNA mimic proteins: functions, structures, and bioinformatic analysis.
Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J
2014-05-13
DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.
Modulated structure and molecular dissociation of solid chlorine at high pressures
NASA Astrophysics Data System (ADS)
Li, Peifang; Gao, Guoying; Ma, Yanming
2012-08-01
Among diatomic molecular halogen solids, high pressure structures of solid chlorine (Cl2) remain elusive and least studied. We here report first-principles structural search on solid Cl2 at high pressures through our developed particle-swarm optimization algorithm. We successfully reproduced the known molecular Cmca phase (phase I) at low pressure and found that it remains stable up to a high pressure 142 GPa. At 150 GPa, our structural searches identified several energetically competitive, structurally similar, and modulated structures. Analysis of the structural results and their similarity with those in solid Br2 and I2, it was suggested that solid Cl2 adopts an incommensurate modulated structure with a modulation wave close to 2/7 in a narrow pressure range 142-157 GPa. Eventually, our simulations at >157 GPa were able to predict the molecular dissociation of solid Cl2 into monatomic phases having body centered orthorhombic (bco) and face-centered cubic (fcc) structures, respectively. One unique monatomic structural feature of solid Cl2 is the absence of intermediate body centered tetragonal (bct) structure during the bco → fcc transition, which however has been observed or theoretically predicted in solid Br2 and I2. Electron-phonon coupling calculations revealed that solid Cl2 becomes superconductors within bco and fcc phases possessing a highest superconducting temperature of 13.03 K at 380 GPa. We further probed the molecular Cmca → incommensurate phase transition mechanism and found that the softening of the Ag vibrational (rotational) Raman mode in the Cmca phase might be the driving force to initiate the transition.
Shenker, Bennett S
2014-02-01
To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (p<0.0001). Spearman's rho for test-retest reliability was 0.72 (p<0.0001). There was no difference in scores based on Internet search engine. We found a significant difference in scores based on the webpage's order on the Internet search engine webpage (p=0.007). Pairwise comparisons revealed higher scores in the first webpages vs. the fourth (corr p=0.009) and fifth (corr p=0.017). However, this significance was lost when creating composite scores. The five point scoring system to assess diagnostic accuracy of Internet search engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tortora, Maxime M. C.; Doye, Jonathan P. K.
2017-12-01
We detail the application of bounding volume hierarchies to accelerate second-virial evaluations for arbitrary complex particles interacting through hard and soft finite-range potentials. This procedure, based on the construction of neighbour lists through the combined use of recursive atom-decomposition techniques and binary overlap search schemes, is shown to scale sub-logarithmically with particle resolution in the case of molecular systems with high aspect ratios. Its implementation within an efficient numerical and theoretical framework based on classical density functional theory enables us to investigate the cholesteric self-assembly of a wide range of experimentally relevant particle models. We illustrate the method through the determination of the cholesteric behavior of hard, structurally resolved twisted cuboids, and report quantitative evidence of the long-predicted phase handedness inversion with increasing particle thread angles near the phenomenological threshold value of 45°. Our results further highlight the complex relationship between microscopic structure and helical twisting power in such model systems, which may be attributed to subtle geometric variations of their chiral excluded-volume manifold.
Identifying functionally informative evolutionary sequence profiles.
Gil, Nelson; Fiser, Andras
2018-04-15
Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Ab Initio Protein Structure Prediction Using Chunk-TASSER
Zhou, Hongyi; Skolnick, Jeffrey
2007-01-01
We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins ≤250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP3, original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted α-helix content ≥50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score ≥0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP3, TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is ∼11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction. PMID:17496016
A rank-based Prediction Algorithm of Learning User's Intention
NASA Astrophysics Data System (ADS)
Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing
Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
2018-01-01
This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. PMID:29768463
Rani R, Hannah Jessie; Victoire T, Aruldoss Albert
2018-01-01
This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy.
Structural and electronic properties for atomic clusters
NASA Astrophysics Data System (ADS)
Sun, Yan
We have studied the structural and electronic properties for different groups of atomic clusters by doing a global search on the potential energy surface using the Taboo Search in Descriptors Space (TSDS) method and calculating the energies with Kohn-Sham Density Functional Theory (KS-DFT). Our goal was to find the structural and electronic principles for predicting the structure and stability of clusters. For Ben (n = 3--20), we have found that the evolution of geometric and electronic properties with size reflects a change in the nature of the bonding from van der Waals to metallic and then bulk-like. The cluster sizes with extra stability agree well with the predictions of the jellium model. In the 4d series of transition metal (TM) clusters, as the d-type bonding becomes more important, the preferred geometric structure changes from icosahedral (Y, Zr), to distorted compact structures (Nb, Mo), and FCC or simple cubic crystal fragments (Tc, Ru, Rh) due to the localized nature of the d-type orbital. Analysis of relative isomer energies and their electronic density of states suggest that these clusters tend to follow a maximum hardness principle (MHP). For A4B12 clusters (A is divalent, B is monovalent), we found unusually large (on average 1.95 eV) HOMO-LUMO gap values. This shows the extra stability at an electronic closed shell (20 electrons) predicted by the jellium model. The importance of symmetry, closed electronic and ionic shells in stability is shown by the relative stability of homotops of Mg4Ag12 which also provides support for the hypothesis that clusters that satisfy more than one stability criterion ("double magic") should be particularly stable.
The quest for the Sun's siblings: an exploratory search in the Hipparcos Catalogue
NASA Astrophysics Data System (ADS)
Brown, Anthony G. A.; Portegies Zwart, Simon F.; Bean, Jennifer
2010-09-01
We describe the results of a search for the remnants of the Sun's birth cluster among stars in the Hipparcos Catalogue. This search is based on the predicted phase-space distribution of the Sun's siblings from simple simulations of the orbits of the cluster stars in a smooth Galactic potential. For stars within 100 pc, the simulations show that it is interesting to examine those that have small space motions relative to the Sun. From amongst the candidate siblings thus selected, there are six stars with ages consistent with that of the Sun. Considering their radial velocities and abundances only one potential candidate, HIP21158, remains, but essentially the result of the search is negative. This is consistent with predictions by Portegies Zwart on the number of siblings near the Sun. We discuss the steps that should be taken in anticipation of the data from the Gaia mission in order to conduct fruitful searches for the Sun's siblings in the future.
The Quest For The Sun's Siblings: An Exploratory Search In The Hipparcos Catalogue
NASA Astrophysics Data System (ADS)
Bean, Jennifer; Brown, A.; Portegies Zwart, S.
2011-01-01
We describe the results of a search for the remnants of the Sun's birth cluster among stars in the Hipparcos Catalogue. This search is based on the predicted phase-space distribution of the Sun's siblings from simple simulations of the orbits of the cluster stars in a smooth Galactic potential. For stars within 100 pc, the simulations show that it is interesting to examine those that have small space motions relative to the Sun. From amongst the candidate siblings thus selected, there are six stars with ages consistent with that of the Sun. Considering their radial velocities and abundances only one potential candidate, HIP21158, remains, but essentially the result of the search is negative. This is consistent with predictions by Portegies Zwart on the number of siblings near the Sun. We discuss the steps that should be taken in anticipation of the data from the Gaia mission in order to conduct fruitful searches for the Sun's siblings in the future.
Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël
2015-01-01
Wikipedia, the world's largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. We extracted chemical structures from entries in Wikipedia and implemented a web system allowing structure and similarity searching on these data. The whole search as well as visualization system is written in JavaScript and therefore can run locally within a web page and does not require a central server. The Wikipedia Chemical Structure Explorer is accessible on-line at www.cheminfo.org/wikipedia and is available also as an open source project from GitHub for local installation. The web-based Wikipedia Chemical Structure Explorer provides a useful resource for research as well as for chemical education enabling both researchers and students easy and user friendly chemistry searching and identification of relevant information in Wikipedia. The tool can also help to improve quality of chemical entries in Wikipedia by providing potential contributors regularly updated list of entries with problematic structures. And last but not least this search system is a nice example of how the modern web technology can be applied in the field of cheminformatics. Graphical abstractWikipedia Chemical Structure Explorer allows substructure and similarity searches on molecules referenced in Wikipedia.
Conci, Markus; Müller, Hermann J; von Mühlenen, Adrian
2013-07-09
In visual search, detection of a target is faster when it is presented within a spatial layout of repeatedly encountered nontarget items, indicating that contextual invariances can guide selective attention (contextual cueing; Chun & Jiang, 1998). However, perceptual regularities may interfere with contextual learning; for instance, no contextual facilitation occurs when four nontargets form a square-shaped grouping, even though the square location predicts the target location (Conci & von Mühlenen, 2009). Here, we further investigated potential causes for this interference-effect: We show that contextual cueing can reliably occur for targets located within the region of a segmented object, but not for targets presented outside of the object's boundaries. Four experiments demonstrate an object-based facilitation in contextual cueing, with a modulation of context-based learning by relatively subtle grouping cues including closure, symmetry, and spatial regularity. Moreover, the lack of contextual cueing for targets located outside the segmented region was due to an absence of (latent) learning of contextual layouts, rather than due to an attentional bias towards the grouped region. Taken together, these results indicate that perceptual segmentation provides a basic structure within which contextual scene regularities are acquired. This in turn argues that contextual learning is constrained by object-based selection.
NASA Astrophysics Data System (ADS)
Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron
2017-05-01
This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.
Evaluating the mutagenic potential of aerosol organic compounds using informatics-based screening
NASA Astrophysics Data System (ADS)
Decesari, Stefano; Kovarich, Simona; Pavan, Manuela; Bassan, Arianna; Ciacci, Andrea; Topping, David
2018-02-01
Whilst general policy objectives to reduce airborne particulate matter (PM) health effects are to reduce exposure to PM as a whole, emerging evidence suggests that more detailed metrics associating impacts with different aerosol components might be needed. Since it is impossible to conduct toxicological screening on all possible molecular species expected to occur in aerosol, in this study we perform a proof-of-concept evaluation on the information retrieved from in silico toxicological predictions, in which a subset (N = 104) of secondary organic aerosol (SOA) compounds were screened for their mutagenicity potential. An extensive database search showed that experimental data are available for 13 % of the compounds, while reliable predictions were obtained for 82 %. A multivariate statistical analysis of the compounds based on their physico-chemical, structural, and mechanistic properties showed that 80 % of the compounds predicted as mutagenic were grouped into six clusters, three of which (five-membered lactones from monoterpene oxidation, oxygenated multifunctional compounds from substituted benzene oxidation, and hydroperoxides from several precursors) represent new candidate groups of compounds for future toxicological screenings. These results demonstrate that coupling model-generated compositions to in silico toxicological screening might enable more comprehensive exploration of the mutagenic potential of specific SOA components.
Marrero-Ponce, Yovani; Iyarreta-Veitía, Maité; Montero-Torres, Alina; Romero-Zaldivar, Carlos; Brandt, Carlos A; Avila, Priscilla E; Kirchgatter, Karin; Machado, Yanetsy
2005-01-01
Malaria has been one of the most significant public health problems for centuries. It affects many tropical and subtropical regions of the world. The increasing resistance of Plasmodium spp. to existing therapies has heightened alarms about malaria in the international health community. Nowadays, there is a pressing need for identifying and developing new drug-based antimalarial therapies. In an effort to overcome this problem, the main purpose of this study is to develop simple linear discriminant-based quantitative structure-activity relationship (QSAR) models for the classification and prediction of antimalarial activity using some of the TOMOCOMD-CARDD (TOpological MOlecular COMputer Design-Computer Aided "Rational" Drug Design) fingerprints, so as to enable computational screening from virtual combinatorial datasets. In this sense, a database of 1562 organic chemicals having great structural variability, 597 of them antimalarial agents and 965 compounds having other clinical uses, was analyzed and presented as a helpful tool, not only for theoretical chemists but also for other researchers in this area. This series of compounds was processed by a k-means cluster analysis in order to design training and predicting sets. Afterward, two linear classification functions were derived in order to discriminate between antimalarial and nonantimalarial compounds. The models (including nonstochastic and stochastic indices) correctly classify more than 93% of the compound set, in both training and external prediction datasets. They showed high Matthews' correlation coefficients, 0.889 and 0.866 for the training set and 0.855 and 0.857 for the test one. The models' predictivity was also assessed and validated by the random removal of 10% of the compounds to form a new test set, for which predictions were made using the models. The overall means of the correct classification for this process (leave group 10% full-out cross validation) using the equations with nonstochastic and stochastic atom-based quadratic fingerprints were 93.93% and 92.77%, respectively. The quadratic maps-based TOMOCOMD-CARDD approach implemented in this work was successfully compared with four of the most useful models for antimalarials selection reported to date. The developed models were then used in a simulation of a virtual search for Ras FTase (FTase = farnesyltransferase) inhibitors with antimalarial activity; 70% and 100% of the 10 inhibitors used in this virtual search were correctly classified, showing the ability of the models to identify new lead antimalarials. Finally, these two QSAR models were used in the identification of previously unknown antimalarials. In this sense, three synthetic intermediaries of quinolinic compounds were evaluated as active/inactive ones using the developed models. The synthesis and biological evaluation of these chemicals against two malaria strains, using chloroquine as a reference, was performed. An accuracy of 100% with the theoretical predictions was observed. Compound 3 showed antimalarial activity, being the first report of an arylaminomethylenemalonate having such behavior. This result opens a door to a virtual study considering a higher variability of the structural core already evaluated, as well as of other chemicals not included in this study. We conclude that the approach described here seems to be a promising QSAR tool for the molecular discovery of novel classes of antimalarial drugs, which may meet the dual challenges posed by drug-resistant parasites and the rapid progression of malaria illnesses.
First principles crystal engineering of nonlinear optical materials. I. Prototypical case of urea
NASA Astrophysics Data System (ADS)
Masunov, Artëm E.; Tannu, Arman; Dyakov, Alexander A.; Matveeva, Anastasia D.; Freidzon, Alexandra Ya.; Odinokov, Alexey V.; Bagaturyants, Alexander A.
2017-06-01
The crystalline materials with nonlinear optical (NLO) properties are critically important for several technological applications, including nanophotonic and second harmonic generation devices. Urea is often considered to be a standard NLO material, due to the combination of non-centrosymmetric crystal packing and capacity for intramolecular charge transfer. Various approaches to crystal engineering of non-centrosymmetric molecular materials were reported in the literature. Here we propose using global lattice energy minimization to predict the crystal packing from the first principles. We developed a methodology that includes the following: (1) parameter derivation for polarizable force field AMOEBA; (2) local minimizations of crystal structures with these parameters, combined with the evolutionary algorithm for a global minimum search, implemented in program USPEX; (3) filtering out duplicate polymorphs produced; (4) reoptimization and final ranking based on density functional theory (DFT) with many-body dispersion (MBD) correction; and (5) prediction of the second-order susceptibility tensor by finite field approach. This methodology was applied to predict virtual urea polymorphs. After filtering based on packing similarity, only two distinct packing modes were predicted: one experimental and one hypothetical. DFT + MBD ranking established non-centrosymmetric crystal packing as the global minimum, in agreement with the experiment. Finite field approach was used to predict nonlinear susceptibility, and H-bonding was found to account for a 2.5-fold increase in molecular hyperpolarizability to the bulk value.
Wang, Jie-Sheng; Han, Shuang
2015-01-01
For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, a feed-forward neural network (FNN) based soft-sensor model optimized by the hybrid algorithm combining particle swarm optimization (PSO) algorithm and gravitational search algorithm (GSA) is proposed. Although GSA has better optimization capability, it has slow convergence velocity and is easy to fall into local optimum. So in this paper, the velocity vector and position vector of GSA are adjusted by PSO algorithm in order to improve its convergence speed and prediction accuracy. Finally, the proposed hybrid algorithm is adopted to optimize the parameters of FNN soft-sensor model. Simulation results show that the model has better generalization and prediction accuracy for the concentrate grade and tailings recovery rate to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:26583034
Nachman, Gösta
2006-01-01
The spatial distributions of two-spotted spider mites Tetranychus urticae and their natural enemy, the phytoseiid predator Phytoseiulus persimilis, were studied on six full-grown cucumber plants. Both mite species were very patchily distributed and P. persimilis tended to aggregate on leaves with abundant prey. The effects of non-homogenous distributions and degree of spatial overlap between prey and predators on the per capita predation rate were studied by means of a stage-specific predation model that averages the predation rates over all the local populations inhabiting the individual leaves. The empirical predation rates were compared with predictions assuming random predator search and/or an even distribution of prey. The analysis clearly shows that the ability of the predators to search non-randomly increases their predation rate. On the other hand, the prey may gain if it adopts a more even distribution when its density is low and a more patchy distribution when density increases. Mutual interference between searching predators reduces the predation rate, but the effect is negligible. The stage-specific functional response model was compared with two simpler models without explicit stage structure. Both unstructured models yielded predictions that were quite similar to those of the stage-structured model.
The Role of Prediction In Perception: Evidence From Interrupted Visual Search
Mereu, Stefania; Zacks, Jeffrey M.; Kurby, Christopher A.; Lleras, Alejandro
2014-01-01
Recent studies of rapid resumption—an observer’s ability to quickly resume a visual search after an interruption—suggest that predictions underlie visual perception. Previous studies showed that when the search display changes unpredictably after the interruption, rapid resumption disappears. This conclusion is at odds with our everyday experience, where the visual system seems to be quite efficient despite continuous changes of the visual scene; however, in the real world, changes can typically be anticipated based on previous knowledge. The present study aimed to evaluate whether changes to the visual display can be incorporated into the perceptual hypotheses, if observers are allowed to anticipate such changes. Results strongly suggest that an interrupted visual search can be rapidly resumed even when information in the display has changed after the interruption, so long as participants not only can anticipate them, but also are aware that such changes might occur. PMID:24820440
Comprehensive computational design of ordered peptide macrocycles
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hosseinzadeh, Parisa; Bhardwaj, Gaurav; Mulligan, Vikram Khipple
Mixed chirality peptide macrocycles such as cyclosporine are among the most potent therapeutics identified to-date, but there is currently no way to systematically search through the structural space spanned by such compounds for new drug candidates. Natural proteins do not provide a useful guide: peptide macrocycles lack regular secondary structures and hydrophobic cores and have different backbone torsional constraints. Hence the development of new peptide macrocycles has been approached by modifying natural products or using library selection methods; the former is limited by the small number of known structures, and the latter by the limited size and diversity accessible throughmore » library-based methods. To overcome these limitations, here we enumerate the stable structures that can be adopted by macrocyclic peptides composed of L and D amino acids. We identify more than 200 designs predicted to fold into single stable structures, many times more than the number of currently available unbound peptide macrocycle structures. We synthesize and characterize by NMR twelve 7-10 residue macrocycles, 9 of which have structures very close to the design models in solution. NMR structures of three 11-14 residue bicyclic designs are also very close to the computational models. Our results provide a nearly complete coverage of the rich space of structures possible for short peptide based macrocycles unparalleled for other molecular systems, and vastly increase the available starting scaffolds for both rational drug design and library selection methods.« less
Simulations of Metallic Nanoscale Structures
NASA Astrophysics Data System (ADS)
Jacobsen, Karsten W.
2003-03-01
Density-functional-theory calculations can be used to understand and predict materials properties based on their nanoscale composition and structure. In combination with efficient search algorithms DFT can furthermore be applied in the nanoscale design of optimized materials. The first part of the talk will focus on two different types of nanostructures with an interesting interplay between chemical activity and conducting states. MoS2 nanoclusters are known for their catalyzing effect in the hydrodesulfurization process which removes sulfur-containing molecules from oil products. MoS2 is a layered material which is insulating. However, DFT calculations indicates the exsistence of metallic states at some of the edges of MoS2 nanoclusters, and the calculations show that the conducting states are not passivated by for example the presence of hydrogen gas. The edge states may play an important role for the chemical activity of MoS_2. Metallic nanocontacts can be formed during the breaking of a piece of metal, and atomically thin structures with conductance of only a single quantum unit may be formed. Such open metallic structures are chemically very active and susceptible to restructuring through interactions with molecular gases. DFT calculations show for example that atomically thin gold wires may incorporate oxygen atoms forming a new type of metallic nanowire. Adsorbates like hydrogen may also affect the conductance. In the last part of the talk I shall discuss the possibilities for designing alloys with optimal mechanical properties based on a combination of DFT calculations with genetic search algorithms. Simulaneous optimization of several parameters (stability, price, compressibility) is addressed through the determination of Pareto optimal alloy compositions within a large database of more than 64000 alloys.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Jalem, Randy; Kimura, Mayumi; Nakayama, Masanobu; Kasuga, Toshihiro
2015-06-22
The ongoing search for fast Li-ion conducting solid electrolytes has driven the deployment surge on density functional theory (DFT) computation and materials informatics for exploring novel chemistries before actual experimental testing. Existing structure prototypes can now be readily evaluated beforehand not only to map out trends on target properties or for candidate composition selection but also for gaining insights on structure-property relationships. Recently, the tavorite structure has been determined to be capable of a fast Li ion insertion rate for battery cathode applications. Taking this inspiration, we surveyed the LiMTO4F tavorite system (M(3+)-T(5+) and M(2+)-T(6+) pairs; M is nontransition metals) for solid electrolyte use, identifying promising compositions with enormously low Li migration energy (ME) and understanding how structure parameters affect or modulate ME. We employed a combination of DFT computation, variable interaction analysis, graph theory, and a neural network for building a crystal structure-based ME prediction model. Candidate compositions that were predicted include LiGaPO4F (0.25 eV), LiGdPO4F (0.30 eV), LiDyPO4F (0.30 eV), LiMgSO4F (0.21 eV), and LiMgSeO4F (0.11 eV). With chemical substitutions at M and T sites, competing effects among Li pathway bottleneck size, polyanion covalency, and local lattice distortion were determined to be crucial for controlling ME. A way to predict ME for multiple structure types within the neural network framework was also explored.
Collaborations for Arctic Sea Ice Information and Tools
NASA Astrophysics Data System (ADS)
Sheffield Guy, L.; Wiggins, H. V.; Turner-Bogren, E. J.; Rich, R. H.
2017-12-01
The dramatic and rapid changes in Arctic sea ice require collaboration across boundaries, including between disciplines, sectors, institutions, and between scientists and decision-makers. This poster will highlight several projects that provide knowledge to advance the development and use of sea ice knowledge. Sea Ice for Walrus Outlook (SIWO: https://www.arcus.org/search-program/siwo) - SIWO is a resource for Alaskan Native subsistence hunters and other interested stakeholders. SIWO provides weekly reports, during April-June, of sea ice conditions relevant to walrus in the northern Bering and southern Chukchi seas. Collaboration among scientists, Alaskan Native sea-ice experts, and the Eskimo Walrus Commission is fundamental to this project's success. Sea Ice Prediction Network (SIPN: https://www.arcus.org/sipn) - A collaborative, multi-agency-funded project focused on seasonal Arctic sea ice predictions. The goals of SIPN include: coordinate and evaluate Arctic sea ice predictions; integrate, assess, and guide observations; synthesize predictions and observations; and disseminate predictions and engage key stakeholders. The Sea Ice Outlook—a key activity of SIPN—is an open process to share and synthesize predictions of the September minimum Arctic sea ice extent and other variables. Other SIPN activities include workshops, webinars, and communications across the network. Directory of Sea Ice Experts (https://www.arcus.org/researchers) - ARCUS has undertaken a pilot project to develop a web-based directory of sea ice experts across institutions, countries, and sectors. The goal of the project is to catalyze networking between individual investigators, institutions, funding agencies, and other stakeholders interested in Arctic sea ice. Study of Environmental Arctic Change (SEARCH: https://www.arcus.org/search-program) - SEARCH is a collaborative program that advances research, synthesizes research findings, and broadly communicates the results to support informed decision-making. One of SEARCH's primary science topics is focused on Arctic sea ice; the SEARCH Sea Ice Action Team is leading efforts to advance understanding and awareness of the impacts of Arctic sea-ice loss.
Unexpected Ground-State Structure and Mechanical Properties of Ir₂Zr Intermetallic Compound.
Zhang, Meiguang; Cao, Rui; Zhao, Meijie; Du, Juan; Cheng, Ke
2018-01-10
Using an unbiased structure searching method, a new orthorhombic Cmmm structure consisting of ZrIr 12 polyhedron building blocks is predicted to be the thermodynamic ground-state of stoichiometric intermetallic Ir₂Zr in Ir-Zr systems. The formation enthalpy of the Cmmm structure is considerably lower than that of the previously synthesized Cu₂Mg-type phase, by ~107 meV/atom, as demonstrated by the calculation of formation enthalpy. Meanwhile, the phonon dispersion calculations further confirmed the dynamical stability of Cmmm phase under ambient conditions. The mechanical properties, including elastic stability, rigidity, and incompressibility, as well as the elastic anisotropy of Cmmm -Ir₂Zr intermetallic, have thus been fully determined. It is found that the predicted Cmmm phase exhibits nearly elastic isotropic and great resistance to shear deformations within the (100) crystal plane. Evidence of atomic bonding related to the structural stability for Ir₂Zr were manifested by calculations of the electronic structures.
A new series of two-dimensional silicon crystals with versatile electronic properties
NASA Astrophysics Data System (ADS)
Chae, Kisung; Kim, Duck Young; Son, Young-Woo
2018-04-01
Silicon (Si) is one of the most extensively studied materials owing to its significance to semiconductor science and technology. While efforts to find a new three-dimensional (3D) Si crystal with unusual properties have made some progress, its two-dimensional (2D) phases have not yet been explored as much. Here, based on a newly developed systematic ab initio materials searching strategy, we report a series of novel 2D Si crystals with unprecedented structural and electronic properties. The new structures exhibit perfectly planar outermost surface layers of a distorted hexagonal network with their thicknesses varying with the atomic arrangement inside. Dramatic changes in electronic properties ranging from semimetal to semiconducting with indirect energy gaps and even to one with direct energy gaps are realized by varying thickness as well as by surface oxidation. Our predicted 2D Si crystals with flat surfaces and tunable electronic properties will shed light on the development of silicon-based 2D electronics technology.
Building proteins from C alpha coordinates using the dihedral probability grid Monte Carlo method.
Mathiowetz, A. M.; Goddard, W. A.
1995-01-01
Dihedral probability grid Monte Carlo (DPG-MC) is a general-purpose method of conformational sampling that can be applied to many problems in peptide and protein modeling. Here we present the DPG-MC method and apply it to predicting complete protein structures from C alpha coordinates. This is useful in such endeavors as homology modeling, protein structure prediction from lattice simulations, or fitting protein structures to X-ray crystallographic data. It also serves as an example of how DPG-MC can be applied to systems with geometric constraints. The conformational propensities for individual residues are used to guide conformational searches as the protein is built from the amino-terminus to the carboxyl-terminus. Results for a number of proteins show that both the backbone and side chain can be accurately modeled using DPG-MC. Backbone atoms are generally predicted with RMS errors of about 0.5 A (compared to X-ray crystal structure coordinates) and all atoms are predicted to an RMS error of 1.7 A or better. PMID:7549885
Bibliography of information on mechanics of structural failure
NASA Technical Reports Server (NTRS)
Carpenter, J. L., Jr.; Moya, N.; Shaffer, R. A.; Smith, D. M.
1973-01-01
A bibliography of approximately 1500 reference citations related to six problem areas in the mechanics of failure in aerospace structures is presented. The bibliography represents a search of the literature published in the ten year period 1962-1972 and is largely limited to documents published in the United States. Listings are subdivided into the six problem areas: (1) life prediction of structural materials; (2) fracture toughness data; (3) fracture mechanics analysis; (4) hydrogen embrittlement; (5) protective coatings; and (6) composite materials. An author index is included.
Najmanovich, Rafael
2013-01-01
IsoCleft Finder is a web-based tool for the detection of local geometric and chemical similarities between potential small-molecule binding cavities and a non-redundant dataset of ligand-bound known small-molecule binding-sites. The non-redundant dataset developed as part of this study is composed of 7339 entries representing unique Pfam/PDB-ligand (hetero group code) combinations with known levels of cognate ligand similarity. The query cavity can be uploaded by the user or detected automatically by the system using existing PDB entries as well as user-provided structures in PDB format. In all cases, the user can refine the definition of the cavity interactively via a browser-based Jmol 3D molecular visualization interface. Furthermore, users can restrict the search to a subset of the dataset using a cognate-similarity threshold. Local structural similarities are detected using the IsoCleft software and ranked according to two criteria (number of atoms in common and Tanimoto score of local structural similarity) and the associated Z-score and p-value measures of statistical significance. The results, including predicted ligands, target proteins, similarity scores, number of atoms in common, etc., are shown in a powerful interactive graphical interface. This interface permits the visualization of target ligands superimposed on the query cavity and additionally provides a table of pairwise ligand topological similarities. Similarities between top scoring ligands serve as an additional tool to judge the quality of the results obtained. We present several examples where IsoCleft Finder provides useful functional information. IsoCleft Finder results are complementary to existing approaches for the prediction of protein function from structure, rational drug design and x-ray crystallography. IsoCleft Finder can be found at: http://bcb.med.usherbrooke.ca/isocleftfinder. PMID:24555058
Research on Web Search Behavior: How Online Query Data Inform Social Psychology.
Lai, Kaisheng; Lee, Yan Xin; Chen, Hao; Yu, Rongjun
2017-10-01
The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.
NASA Astrophysics Data System (ADS)
Tian, Zhen; Liu, Jiyuan; Zhang, Yalin
2016-03-01
Given the advantages of behavioral disruption application in pest control and the damage of Cydia pomonella, due progresses have not been made in searching active semiochemicals for codling moth. In this research, 31 candidate semiochemicals were ranked for their binding potential to Cydia pomonella pheromone binding protein 2 (CpomPBP2) by simulated docking, and this sorted result was confirmed by competitive binding assay. This high predicting accuracy of virtual screening led to the construction of a rapid and viable method for semiochemicals searching. By reference to binding mode analyses, hydrogen bond and hydrophobic interaction were suggested to be two key factors in determining ligand affinity, so is the length of molecule chain. So it is concluded that semiochemicals of appropriate chain length with hydroxyl group or carbonyl group at one head tended to be favored by CpomPBP2. Residues involved in binding with each ligand were pointed out as well, which were verified by computational alanine scanning mutagenesis. Progress made in the present study helps establish an efficient method for predicting potentially active compounds and prepares for the application of high-throughput virtual screening in searching semiochemicals by taking insights into binding mode analyses.
Tian, Zhen; Liu, Jiyuan; Zhang, Yalin
2016-01-01
Given the advantages of behavioral disruption application in pest control and the damage of Cydia pomonella, due progresses have not been made in searching active semiochemicals for codling moth. In this research, 31 candidate semiochemicals were ranked for their binding potential to Cydia pomonella pheromone binding protein 2 (CpomPBP2) by simulated docking, and this sorted result was confirmed by competitive binding assay. This high predicting accuracy of virtual screening led to the construction of a rapid and viable method for semiochemicals searching. By reference to binding mode analyses, hydrogen bond and hydrophobic interaction were suggested to be two key factors in determining ligand affinity, so is the length of molecule chain. So it is concluded that semiochemicals of appropriate chain length with hydroxyl group or carbonyl group at one head tended to be favored by CpomPBP2. Residues involved in binding with each ligand were pointed out as well, which were verified by computational alanine scanning mutagenesis. Progress made in the present study helps establish an efficient method for predicting potentially active compounds and prepares for the application of high-throughput virtual screening in searching semiochemicals by taking insights into binding mode analyses. PMID:26928635
NASA Astrophysics Data System (ADS)
Chen, L. Leon; Ulmer, Stephan; Deisboeck, Thomas S.
2010-01-01
We present an application of a previously developed agent-based glioma model (Chen et al 2009 Biosystems 95 234-42) for predicting spatio-temporal tumor progression using a patient-specific MRI lattice derived from apparent diffusion coefficient (ADC) data. Agents representing collections of migrating glioma cells are initialized based upon voxels at the outer border of the tumor identified on T1-weighted (Gd+) MRI at an initial time point. These simulated migratory cells exhibit a specific biologically inspired spatial search paradigm, representing a weighting of the differential contribution from haptotactic permission and biomechanical resistance on the migration decision process. ADC data from 9 months after the initial tumor resection were used to select the best search paradigm for the simulation, which was initiated using data from 6 months after the initial operation. Using this search paradigm, 100 simulations were performed to derive a probabilistic map of tumor invasion locations. The simulation was able to successfully predict a recurrence in the dorsal/posterior aspect long before it was depicted on T1-weighted MRI, 18 months after the initial operation.
Chen, L Leon; Ulmer, Stephan; Deisboeck, Thomas S
2010-01-21
We present an application of a previously developed agent-based glioma model (Chen et al 2009 Biosystems 95 234-42) for predicting spatio-temporal tumor progression using a patient-specific MRI lattice derived from apparent diffusion coefficient (ADC) data. Agents representing collections of migrating glioma cells are initialized based upon voxels at the outer border of the tumor identified on T1-weighted (Gd+) MRI at an initial time point. These simulated migratory cells exhibit a specific biologically inspired spatial search paradigm, representing a weighting of the differential contribution from haptotactic permission and biomechanical resistance on the migration decision process. ADC data from 9 months after the initial tumor resection were used to select the best search paradigm for the simulation, which was initiated using data from 6 months after the initial operation. Using this search paradigm, 100 simulations were performed to derive a probabilistic map of tumor invasion locations. The simulation was able to successfully predict a recurrence in the dorsal/posterior aspect long before it was depicted on T1-weighted MRI, 18 months after the initial operation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perl, M.L.
This paper is based upon lectures in which I have described and explored the ways in which experimenters can try to find answers, or at least clues toward answers, to some of the fundamental questions of elementary particle physics. All of these experimental techniques and directions have been discussed fully in other papers, for example: searches for heavy charged leptons, tests of quantum chromodynamics, searches for Higgs particles, searches for particles predicted by supersymmetric theories, searches for particles predicted by technicolor theories, searches for proton decay, searches for neutrino oscillations, monopole searches, studies of low transfer momentum hadron physics atmore » very high energies, and elementary particle studies using cosmic rays. Each of these subjects requires several lectures by itself to do justice to the large amount of experimental work and theoretical thought which has been devoted to these subjects. My approach in these tutorial lectures is to describe general ways to experiment beyond the standard model. I will use some of the topics listed to illustrate these general ways. Also, in these lectures I present some dreams and challenges about new techniques in experimental particle physics and accelerator technology, I call these Experimental Needs. 92 references.« less
Monte Carlo-based searching as a tool to study carbohydrate structure
USDA-ARS?s Scientific Manuscript database
A torsion angle-based Monte-Carlo searching routine was developed and applied to several carbohydrate modeling problems. The routine was developed as a Unix shell script that calls several programs, which allows it to be interfaced with multiple potential functions and various functions for evaluat...
Segmentation of Pollen Tube Growth Videos Using Dynamic Bi-Modal Fusion and Seam Carving.
Tambo, Asongu L; Bhanu, Bir
2016-05-01
The growth of pollen tubes is of significant interest in plant cell biology, as it provides an understanding of internal cell dynamics that affect observable structural characteristics such as cell diameter, length, and growth rate. However, these parameters can only be measured in experimental videos if the complete shape of the cell is known. The challenge is to accurately obtain the cell boundary in noisy video images. Usually, these measurements are performed by a scientist who manually draws regions-of-interest on the images displayed on a computer screen. In this paper, a new automated technique is presented for boundary detection by fusing fluorescence and brightfield images, and a new efficient method of obtaining the final cell boundary through the process of Seam Carving is proposed. This approach takes advantage of the nature of the fusion process and also the shape of the pollen tube to efficiently search for the optimal cell boundary. In video segmentation, the first two frames are used to initialize the segmentation process by creating a search space based on a parametric model of the cell shape. Updates to the search space are performed based on the location of past segmentations and a prediction of the next segmentation.Experimental results show comparable accuracy to a previous method, but significant decrease in processing time. This has the potential for real time applications in pollen tube microscopy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamdani, Hazrina Yusof, E-mail: hazrina@mfrlab.org; Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas; Artymiuk, Peter J., E-mail: p.artymiuk@sheffield.ac.uk
A fundamental understanding of the atomic level interactions in ribonucleic acid (RNA) and how they contribute towards RNA architecture is an important knowledge platform to develop through the discovery of motifs from simple arrangements base pairs, to more complex arrangements such as triples and larger patterns involving non-standard interactions. The network of hydrogen bond interactions is important in connecting bases to form potential tertiary motifs. Therefore, there is an urgent need for the development of automated methods for annotating RNA 3D structures based on hydrogen bond interactions. COnnection tables Graphs for Nucleic ACids (COGNAC) is automated annotation system using graphmore » theoretical approaches that has been developed for the identification of RNA 3D motifs. This program searches for patterns in the unbroken networks of hydrogen bonds for RNA structures and capable of annotating base pairs and higher-order base interactions, which ranges from triples to sextuples. COGNAC was able to discover 22 out of 32 quadruples occurrences of the Haloarcula marismortui large ribosomal subunit (PDB ID: 1FFK) and two out of three occurrences of quintuple interaction reported by the non-canonical interactions in RNA (NCIR) database. These and several other interactions of interest will be discussed in this paper. These examples demonstrate that the COGNAC program can serve as an automated annotation system that can be used to annotate conserved base-base interactions and could be added as additional information to established RNA secondary structure prediction methods.« less
Freiburg RNA tools: a central online resource for RNA-focused research and teaching.
Raden, Martin; Ali, Syed M; Alkhnbashi, Omer S; Busch, Anke; Costa, Fabrizio; Davis, Jason A; Eggenhofer, Florian; Gelhausen, Rick; Georg, Jens; Heyne, Steffen; Hiller, Michael; Kundu, Kousik; Kleinkauf, Robert; Lott, Steffen C; Mohamed, Mostafa M; Mattheis, Alexander; Miladi, Milad; Richter, Andreas S; Will, Sebastian; Wolff, Joachim; Wright, Patrick R; Backofen, Rolf
2018-05-21
The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.
MEGANTE: A Web-Based System for Integrated Plant Genome Annotation
Numa, Hisataka; Itoh, Takeshi
2014-01-01
The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915
Bernini, Andrea; Henrici De Angelis, Lucia; Morandi, Edoardo; Spiga, Ottavia; Santucci, Annalisa; Assfalg, Michael; Molinari, Henriette; Pillozzi, Serena; Arcangeli, Annarosa; Niccolai, Neri
2014-03-01
Hotspot delineation on protein surfaces represents a fundamental step for targeting protein-protein interfaces. Disruptors of protein-protein interactions can be designed provided that the sterical features of binding pockets, including the transient ones, can be defined. Molecular Dynamics, MD, simulations have been used as a reliable framework for identifying transient pocket openings on the protein surface. Accessible surface area and intramolecular H-bond involvement of protein backbone amides are proposed as descriptors for characterizing binding pocket occurrence and evolution along MD trajectories. TEMPOL induced paramagnetic perturbations on (1)H-(15)N HSQC signals of protein backbone amides have been analyzed as a fragment-based search for surface hotspots, in order to validate MD predicted pockets. This procedure has been applied to CXCL12, a small chemokine responsible for tumor progression and proliferation. From combined analysis of MD data and paramagnetic profiles, two CXCL12 sites suitable for the binding of small molecules were identified. One of these sites is the already well characterized CXCL12 region involved in the binding to CXCR4 receptor. The other one is a transient pocket predicted by Molecular Dynamics simulations, which could not be observed from static analysis of CXCL12 PDB structures. The present results indicate how TEMPOL, instrumental in identifying this transient pocket, can be a powerful tool to delineate minor conformations which can be highly relevant in dynamic discovery of antitumoral drugs. Copyright © 2013 Elsevier B.V. All rights reserved.
SGO: A fast engine for ab initio atomic structure global optimization by differential evolution
NASA Astrophysics Data System (ADS)
Chen, Zhanghui; Jia, Weile; Jiang, Xiangwei; Li, Shu-Shen; Wang, Lin-Wang
2017-10-01
As the high throughout calculations and material genome approaches become more and more popular in material science, the search for optimal ways to predict atomic global minimum structure is a high research priority. This paper presents a fast method for global search of atomic structures at ab initio level. The structures global optimization (SGO) engine consists of a high-efficiency differential evolution algorithm, accelerated local relaxation methods and a plane-wave density functional theory code running on GPU machines. The purpose is to show what can be achieved by combining the superior algorithms at the different levels of the searching scheme. SGO can search the global-minimum configurations of crystals, two-dimensional materials and quantum clusters without prior symmetry restriction in a relatively short time (half or several hours for systems with less than 25 atoms), thus making such a task a routine calculation. Comparisons with other existing methods such as minima hopping and genetic algorithm are provided. One motivation of our study is to investigate the properties of magnetic systems in different phases. The SGO engine is capable of surveying the local minima surrounding the global minimum, which provides the information for the overall energy landscape of a given system. Using this capability we have found several new configurations for testing systems, explored their energy landscape, and demonstrated that the magnetic moment of metal clusters fluctuates strongly in different local minima.
Kinjo, Akira R.; Nakamura, Haruki
2012-01-01
Comparison and classification of protein structures are fundamental means to understand protein functions. Due to the computational difficulty and the ever-increasing amount of structural data, however, it is in general not feasible to perform exhaustive all-against-all structure comparisons necessary for comprehensive classifications. To efficiently handle such situations, we have previously proposed a method, now called GIRAF. We herein describe further improvements in the GIRAF protein structure search and alignment method. The GIRAF method achieves extremely efficient search of similar structures of ligand binding sites of proteins by exploiting database indexing of structural features of local coordinate frames. In addition, it produces refined atom-wise alignments by iterative applications of the Hungarian method to the bipartite graph defined for a pair of superimposed structures. By combining the refined alignments based on different local coordinate frames, it is made possible to align structures involving domain movements. We provide detailed accounts for the database design, the search and alignment algorithms as well as some benchmark results. PMID:27493524
NASA Astrophysics Data System (ADS)
Aaltonen, T.; Amerio, S.; Amidei, D.; Anastassov, A.; Annovi, A.; Antos, J.; Apollinari, G.; Appel, J. A.; Arisawa, T.; Artikov, A.; Asaadi, J.; Ashmanskas, W.; Auerbach, B.; Aurisano, A.; Azfar, F.; Badgett, W.; Bae, T.; Barbaro-Galtieri, A.; Barnes, V. E.; Barnett, B. A.; Barria, P.; Bartos, P.; Bauce, M.; Bedeschi, F.; Behari, S.; Bellettini, G.; Bellinger, J.; Benjamin, D.; Beretvas, A.; Bhatti, A.; Bland, K. R.; Blumenfeld, B.; Bocci, A.; Bodek, A.; Bortoletto, D.; Boudreau, J.; Boveia, A.; Brigliadori, L.; Bromberg, C.; Brucken, E.; Budagov, J.; Budd, H. S.; Burkett, K.; Busetto, G.; Bussey, P.; Butti, P.; Buzatu, A.; Calamba, A.; Camarda, S.; Campanelli, M.; Canelli, F.; Carls, B.; Carlsmith, D.; Carosi, R.; Carrillo, S.; Casal, B.; Casarsa, M.; Castro, A.; Catastini, P.; Cauz, D.; Cavaliere, V.; Cavalli-Sforza, M.; Cerri, A.; Cerrito, L.; Chen, Y. C.; Chertok, M.; Chiarelli, G.; Chlachidze, G.; Cho, K.; Chokheli, D.; Ciocci, M. A.; Clark, A.; Clarke, C.; Convery, M. E.; Conway, J.; Corbo, M.; Cordelli, M.; Cox, C. A.; Cox, D. J.; Cremonesi, M.; Cruz, D.; Cuevas, J.; Culbertson, R.; d'Ascenzo, N.; Datta, M.; De Barbaro, P.; Demortier, L.; Deninno, M.; d'Errico, M.; Devoto, F.; Di Canto, A.; Di Ruzza, B.; Dittmann, J. R.; D'Onofrio, M.; Donati, S.; Dorigo, M.; Driutti, A.; Ebina, K.; Edgar, R.; Elagin, A.; Erbacher, R.; Errede, S.; Esham, B.; Eusebi, R.; Farrington, S.; Fernández Ramos, J. P.; Field, R.; Flanagan, G.; Forrest, R.; Franklin, M.; Freeman, J. C.; Frisch, H.; Funakoshi, Y.; Garfinkel, A. F.; Garosi, P.; Gerberich, H.; Gerchtein, E.; Giagu, S.; Giakoumopoulou, V.; Gibson, K.; Ginsburg, C. M.; Giokaris, N.; Giromini, P.; Giurgiu, G.; Glagolev, V.; Glenzinski, D.; Gold, M.; Goldin, D.; Golossanov, A.; Gomez, G.; Gomez-Ceballos, G.; Goncharov, M.; González López, O.; Gorelov, I.; Goshaw, A. T.; Goulianos, K.; Gramellini, E.; Grinstein, S.; Grosso-Pilcher, C.; Group, R. C.; Guimaraes da Costa, J.; Hahn, S. R.; Han, J. Y.; Happacher, F.; Hara, K.; Hare, M.; Harr, R. F.; Harrington-Taber, T.; Hatakeyama, K.; Hays, C.; Heinrich, J.; Herndon, M.; Hocker, A.; Hong, Z.; Hopkins, W.; Hou, S.; Hughes, R. E.; Husemann, U.; Hussein, M.; Huston, J.; Introzzi, G.; Iori, M.; Ivanov, A.; James, E.; Jang, D.; Jayatilaka, B.; Jeon, E. J.; Jindariani, S.; Jones, M.; Joo, K. K.; Jun, S. Y.; Junk, T. R.; Kambeitz, M.; Kamon, T.; Karchin, P. E.; Kasmi, A.; Kato, Y.; Ketchum, W.; Keung, J.; Kilminster, B.; Kim, D. H.; Kim, H. S.; Kim, J. E.; Kim, M. J.; Kim, S. B.; Kim, S. H.; Kim, Y. J.; Kim, Y. K.; Kimura, N.; Kirby, M.; Knoepfel, K.; Kondo, K.; Kong, D. J.; Konigsberg, J.; Kotwal, A. V.; Kreps, M.; Kroll, J.; Kruse, M.; Kuhr, T.; Kurata, M.; Laasanen, A. T.; Lammel, S.; Lancaster, M.; Lannon, K.; Latino, G.; Lee, H. S.; Lee, J. S.; Leo, S.; Leone, S.; Lewis, J. D.; Limosani, A.; Lipeles, E.; Lister, A.; Liu, H.; Liu, Q.; Liu, T.; Lockwitz, S.; Loginov, A.; Lucà, A.; Lucchesi, D.; Lueck, J.; Lujan, P.; Lukens, P.; Lungu, G.; Lys, J.; Lysak, R.; Madrak, R.; Maestro, P.; Malik, S.; Manca, G.; Manousakis-Katsikakis, A.; Margaroli, F.; Marino, P.; Martínez, M.; Matera, K.; Mattson, M. E.; Mazzacane, A.; Mazzanti, P.; McNulty, R.; Mehta, A.; Mehtala, P.; Mesropian, C.; Miao, T.; Mietlicki, D.; Mitra, A.; Miyake, H.; Moed, S.; Moggi, N.; Moon, C. S.; Moore, R.; Morello, M. J.; Mukherjee, A.; Muller, Th.; Murat, P.; Mussini, M.; Nachtman, J.; Nagai, Y.; Naganoma, J.; Nakano, I.; Napier, A.; Nett, J.; Neu, C.; Nigmanov, T.; Nodulman, L.; Noh, S. Y.; Norniella, O.; Oakes, L.; Oh, S. H.; Oh, Y. D.; Oksuzian, I.; Okusawa, T.; Orava, R.; Ortolan, L.; Pagliarone, C.; Palencia, E.; Palni, P.; Papadimitriou, V.; Parker, W.; Pauletta, G.; Paulini, M.; Paus, C.; Phillips, T. J.; Piacentino, G.; Pianori, E.; Pilot, J.; Pitts, K.; Plager, C.; Pondrom, L.; Poprocki, S.; Potamianos, K.; Pranko, A.; Prokoshin, F.; Ptohos, F.; Punzi, G.; Ranjan, N.; Redondo Fernández, I.; Renton, P.; Rescigno, M.; Rimondi, F.; Ristori, L.; Robson, A.; Rodriguez, T.; Rolli, S.; Ronzani, M.; Roser, R.; Rosner, J. L.; Ruffini, F.; Ruiz, A.; Russ, J.; Rusu, V.; Sakumoto, W. K.; Sakurai, Y.; Santi, L.; Sato, K.; Saveliev, V.; Savoy-Navarro, A.; Schlabach, P.; Schmidt, E. E.; Schwarz, T.; Scodellaro, L.; Scuri, F.; Seidel, S.; Seiya, Y.; Semenov, A.; Sforza, F.; Shalhout, S. Z.; Shears, T.; Shepard, P. F.; Shimojima, M.; Shochet, M.; Shreyber-Tecker, I.; Simonenko, A.; Sinervo, P.; Sliwa, K.; Smith, J. R.; Snider, F. D.; Song, H.; Sorin, V.; Stancari, M.; Denis, R. St.; Stelzer, B.; Stelzer-Chilton, O.; Stentz, D.; Strologas, J.; Sudo, Y.; Sukhanov, A.; Suslov, I.; Takemasa, K.; Takeuchi, Y.; Tang, J.; Tecchio, M.; Teng, P. K.; Thom, J.; Thomson, E.; Thukral, V.; Toback, D.; Tokar, S.; Tollefson, K.; Tomura, T.; Tonelli, D.; Torre, S.; Torretta, D.; Totaro, P.; Trovato, M.; Ukegawa, F.; Uozumi, S.; Vázquez, F.; Velev, G.; Vellidis, C.; Vernieri, C.; Vidal, M.; Vilar, R.; Vizán, J.; Vogel, M.; Volpi, G.; Wagner, P.; Wallny, R.; Wang, S. M.; Warburton, A.; Waters, D.; Wester, W. C., III; Whiteson, D.; Wicklund, A. B.; Wilbur, S.; Williams, H. H.; Wilson, J. S.; Wilson, P.; Winer, B. L.; Wittich, P.; Wolbers, S.; Wolfe, H.; Wright, T.; Wu, X.; Wu, Z.; Yamamoto, K.; Yamato, D.; Yang, T.; Yang, U. K.; Yang, Y. C.; Yao, W.-M.; Yeh, G. P.; Yi, K.; Yoh, J.; Yorita, K.; Yoshida, T.; Yu, G. B.; Yu, I.; Zanetti, A. M.; Zeng, Y.; Zhou, C.; Zucchelli, S.
2013-08-01
We present the first signature-based search for delayed photons using an exclusive photon plus missing transverse energy final state. Events are reconstructed in a data sample from the CDF II detector corresponding to 6.3fb-1 of integrated luminosity from s=1.96TeV proton-antiproton collisions. Candidate events are selected if they contain a photon with an arrival time in the detector larger than expected from a promptly produced photon. The mean number of events from standard model sources predicted by the data-driven background model based on the photon timing distribution is 286±24. A total of 322 events are observed. A p value of 12% is obtained, showing consistency of the data with standard model predictions.
A high-resolution study of ultra-heavy cosmic-ray nuclei (A0178)
NASA Technical Reports Server (NTRS)
Osullivan, D.; Thompson, A.; Oceallaigh, C.; Domingo, V.; Wenzel, K. P.
1984-01-01
The main objective of the experiment is a detailed study of the charge spectra of ultraheavy cosmic-ray nuclei from zinc (Z = 30) to uranium (Z = 92) and beyond using solid-state track detectors. Special emphasis will be placed on the relative abundances in the region Z or - 65, which is thought to be dominated by r-process nucleosynthesis. Subsidiary objectives include the study of the cosmic-ray transiron spectrum a search for the postulated long-lived superheavy (SH) nuclei (Z or = 110), such as (110) SH294, in the contemporary cosmic radiation. The motivation behind the search for super-heavy nuclei is based on predicted half-lives that are short compared to the age of the Earth but long compared to the age of cosmic rays. The detection of such nuclei would have far-reaching consequences for nuclear structure theory. The sample of ultraheavy nuclei obtained in this experiment will provide unique opportunities for many tests concerning element nucleosynthesis, cosmic-ray acceleration, and cosmic-ray propagation.
Feinstein, Wei P; Brylinski, Michal
2015-01-01
Computational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated. In this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite. A systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins. This fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size. Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.
Mirador: A Simple, Fast Search Interface for Remote Sensing Data
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Strub, Richard; Seiler, Edward; Joshi, Talak; MacHarrie, Peter
2008-01-01
A major challenge for remote sensing science researchers is searching and acquiring relevant data files for their research projects based on content, space and time constraints. Several structured query (SQ) and hierarchical navigation (HN) search interfaces have been develop ed to satisfy this requirement, yet the dominant search engines in th e general domain are based on free-text search. The Goddard Earth Sci ences Data and Information Services Center has developed a free-text search interface named Mirador that supports space-time queries, inc luding a gazetteer and geophysical event gazetteer. In order to compe nsate for a slightly reduced search precision relative to SQ and HN t echniques, Mirador uses several search optimizations to return result s quickly. The quick response enables a more iterative search strateg y than is available with many SQ and HN techniques.
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
Keogh, Claire; Wallace, Emma; O’Brien, Kirsty K.; Galvin, Rose; Smith, Susan M.; Lewis, Cliona; Cummins, Anthony; Cousins, Grainne; Dimitrov, Borislav D.; Fahey, Tom
2014-01-01
PURPOSE We describe the methodology used to create a register of clinical prediction rules relevant to primary care. We also summarize the rules included in the register according to various characteristics. METHODS To identify relevant articles, we searched the MEDLINE database (PubMed) for the years 1980 to 2009 and supplemented the results with searches of secondary sources (books on clinical prediction rules) and personal resources (eg, experts in the field). The rules described in relevant articles were classified according to their clinical domain, the stage of development, and the clinical setting in which they were studied. RESULTS Our search identified clinical prediction rules reported between 1965 and 2009. The largest share of rules (37.2%) were retrieved from PubMed. The number of published rules increased substantially over the study decades. We included 745 articles in the register; many contained more than 1 clinical prediction rule study (eg, both a derivation study and a validation study), resulting in 989 individual studies. In all, 434 unique rules had gone through derivation; however, only 54.8% had been validated and merely 2.8% had undergone analysis of their impact on either the process or outcome of clinical care. The rules most commonly pertained to cardiovascular disease, respiratory, and musculoskeletal conditions. They had most often been studied in the primary care or emergency department settings. CONCLUSIONS Many clinical prediction rules have been derived, but only about half have been validated and few have been assessed for clinical impact. This lack of thorough evaluation for many rules makes it difficult to retrieve and identify those that are ready for use at the point of patient care. We plan to develop an international web-based register of clinical prediction rules and computer-based clinical decision support systems. PMID:25024245
Zhao, Ping; Pan, Yuzhuo; Wagner, Christian
2017-01-01
A comprehensive search in literature and published US Food and Drug Administration reviews was conducted to assess whether physiologically based pharmacokinetic (PBPK) modeling could be prospectively used to predict clinical food effect on oral drug absorption. Among the 48 resulted food effect predictions, ∼50% were predicted within 1.25‐fold of observed, and 75% within 2‐fold. Dissolution rate and precipitation time were commonly optimized parameters when PBPK modeling was not able to capture the food effect. The current work presents a knowledgebase for documenting PBPK experience to predict food effect. PMID:29168611
2011-01-01
To efficiently repair DNA, human alkyladenine DNA glycosylase (AAG) must search the million-fold excess of unmodified DNA bases to find a handful of DNA lesions. Such a search can be facilitated by the ability of glycosylases, like AAG, to interact with DNA using two affinities: a lower-affinity interaction in a searching process and a higher-affinity interaction for catalytic repair. Here, we present crystal structures of AAG trapped in two DNA-bound states. The lower-affinity depiction allows us to investigate, for the first time, the conformation of this protein in the absence of a tightly bound DNA adduct. We find that active site residues of AAG involved in binding lesion bases are in a disordered state. Furthermore, two loops that contribute significantly to the positive electrostatic surface of AAG are disordered. Additionally, a higher-affinity state of AAG captured here provides a fortuitous snapshot of how this enzyme interacts with a DNA adduct that resembles a one-base loop. PMID:22148158
CSTutor: A Sketch-Based Tool for Visualizing Data Structures
ERIC Educational Resources Information Center
Buchanan, Sarah; Laviola, Joseph J., Jr.
2014-01-01
We present CSTutor, a sketch-based interface designed to help students understand data structures, specifically Linked Lists, Binary Search Trees, AVL Trees, and Heaps. CSTutor creates an environment that seamlessly combines a user's sketched diagram and code. In each of these data structure modes, the user can naturally sketch a data structure on…
White, Ryen W; Horvitz, Eric
2017-03-01
A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States. To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity. We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases. Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk. The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.
Comparison of PubMed and Google Scholar literature searches.
Anders, Michael E; Evans, Dennis P
2010-05-01
Literature searches are essential to evidence-based respiratory care. To conduct literature searches, respiratory therapists rely on search engines to retrieve information, but there is a dearth of literature on the comparative efficiencies of search engines for researching clinical questions in respiratory care. To compare PubMed and Google Scholar search results for clinical topics in respiratory care to that of a benchmark. We performed literature searches with PubMed and Google Scholar, on 3 clinical topics. In PubMed we used the Clinical Queries search filter. In Google Scholar we used the search filters in the Advanced Scholar Search option. We used the reference list of a related Cochrane Collaboration evidence-based systematic review as the benchmark for each of the search results. We calculated recall (sensitivity) and precision (positive predictive value) with 2 x 2 contingency tables. We compared the results with the chi-square test of independence and Fisher's exact test. PubMed and Google Scholar had similar recall for both overall search results (71% vs 69%) and full-text results (43% vs 51%). PubMed had better precision than Google Scholar for both overall search results (13% vs 0.07%, P < .001) and full-text results (8% vs 0.05%, P < .001). Our results suggest that PubMed searches with the Clinical Queries filter are more precise than with the Advanced Scholar Search in Google Scholar for respiratory care topics. PubMed appears to be more practical to conduct efficient, valid searches for informing evidence-based patient-care protocols, for guiding the care of individual patients, and for educational purposes.
Interaction between scene-based and array-based contextual cueing.
Rosenbaum, Gail M; Jiang, Yuhong V
2013-07-01
Contextual cueing refers to the cueing of spatial attention by repeated spatial context. Previous studies have demonstrated distinctive properties of contextual cueing by background scenes and by an array of search items. Whereas scene-based contextual cueing reflects explicit learning of the scene-target association, array-based contextual cueing is supported primarily by implicit learning. In this study, we investigated the interaction between scene-based and array-based contextual cueing. Participants searched for a target that was predicted by both the background scene and the locations of distractor items. We tested three possible patterns of interaction: (1) The scene and the array could be learned independently, in which case cueing should be expressed even when only one cue was preserved; (2) the scene and array could be learned jointly, in which case cueing should occur only when both cues were preserved; (3) overshadowing might occur, in which case learning of the stronger cue should preclude learning of the weaker cue. In several experiments, we manipulated the nature of the contextual cues present during training and testing. We also tested explicit awareness of scenes, scene-target associations, and arrays. The results supported the overshadowing account: Specifically, scene-based contextual cueing precluded array-based contextual cueing when both were predictive of the location of a search target. We suggest that explicit, endogenous cues dominate over implicit cues in guiding spatial attention.
Draper, John; Enot, David P; Parker, David; Beckmann, Manfred; Snowdon, Stuart; Lin, Wanchang; Zubair, Hassan
2009-01-01
Background Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). Results Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data. PMID:19622150
Wei, Guang-Feng
2015-01-01
The restructuring of nanoparticles at the in situ condition is a common but complex phenomenon in nanoscience. Here, we present the first systematic survey on the structure dynamics and its catalytic consequence for hydrogen evolution reaction (HER) on Pt nanoparticles, as represented by a magic number Pt44 octahedron (∼1 nm size). Using a first principles calculation based global structure search method, we stepwise follow the significant nanoparticle restructuring under HER conditions as driven by thermodynamics to expose {100} facets, and reveal the consequent large activity enhancement due to the marked increase of the concentration of the active site, being identified to be apex atoms. The enhanced kinetics is thus a “byproduct” of the thermodynamical restructuring. Based on the results, the best Pt catalyst for HER is predicted to be ultrasmall Pt particles without core atoms, a size below ∼20 atoms. PMID:29560237
A polymer dataset for accelerated property prediction and design.
Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi
2016-03-01
Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.
ERIC Educational Resources Information Center
Hwang, Gwo-Jen; Kuo, Fan-Ray
2015-01-01
Web-based problem-solving, a compound ability of critical thinking, creative thinking, reasoning thinking and information-searching abilities, has been recognised as an important competence for elementary school students. Some researchers have reported the possible correlations between problem-solving competence and information searching ability;…
Program Design for Retrospective Searches on Large Data Bases
ERIC Educational Resources Information Center
Thiel, L. H.; Heaps, H. S.
1972-01-01
Retrospective search of large data bases requires development of special techniques for automatic compression of data and minimization of the number of input-output operations to the computer files. The computer program should require a relatively small amount of internal memory. This paper describes the structure of such a program. (9 references)…
Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna
2017-10-01
Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Vfold: a web server for RNA structure and folding thermodynamics prediction.
Xu, Xiaojun; Zhao, Peinan; Chen, Shi-Jie
2014-01-01
The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu".
Extant fold-switching proteins are widespread.
Porter, Lauren L; Looger, Loren L
2018-06-05
A central tenet of biology is that globular proteins have a unique 3D structure under physiological conditions. Recent work has challenged this notion by demonstrating that some proteins switch folds, a process that involves remodeling of secondary structure in response to a few mutations (evolved fold switchers) or cellular stimuli (extant fold switchers). To date, extant fold switchers have been viewed as rare byproducts of evolution, but their frequency has been neither quantified nor estimated. By systematically and exhaustively searching the Protein Data Bank (PDB), we found ∼100 extant fold-switching proteins. Furthermore, we gathered multiple lines of evidence suggesting that these proteins are widespread in nature. Based on these lines of evidence, we hypothesized that the frequency of extant fold-switching proteins may be underrepresented by the structures in the PDB. Thus, we sought to identify other putative extant fold switchers with only one solved conformation. To do this, we identified two characteristic features of our ∼100 extant fold-switching proteins, incorrect secondary structure predictions and likely independent folding cooperativity, and searched the PDB for other proteins with similar features. Reassuringly, this method identified dozens of other proteins in the literature with indication of a structural change but only one solved conformation in the PDB. Thus, we used it to estimate that 0.5-4% of PDB proteins switch folds. These results demonstrate that extant fold-switching proteins are likely more common than the PDB reflects, which has implications for cell biology, genomics, and human health. Copyright © 2018 the Author(s). Published by PNAS.
Approximate matching of structured motifs in DNA sequences.
El-Mabrouk, Nadia; Raffinot, Mathieu; Duchesne, Jean-Eudes; Lajoie, Mathieu; Luc, Nicolas
2005-04-01
Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.
Chalfoun, A.D.; Martin, T.E.
2009-01-01
Predation is an important and ubiquitous selective force that can shape habitat preferences of prey species, but tests of alternative mechanistic hypotheses of habitat influences on predation risk are lacking. 2. We studied predation risk at nest sites of a passerine bird and tested two hypotheses based on theories of predator foraging behaviour. The total-foliage hypothesis predicts that predation will decline in areas of greater overall vegetation density by impeding cues for detection by predators. The potential-prey-site hypothesis predicts that predation decreases where predators must search more unoccupied potential nest sites. 3. Both observational data and results from a habitat manipulation provided clear support for the potential-prey-site hypothesis and rejection of the total-foliage hypothesis. Birds chose nest patches containing both greater total foliage and potential nest site density (which were correlated in their abundance) than at random sites, yet only potential nest site density significantly influenced nest predation risk. 4. Our results therefore provided a clear and rare example of adaptive nest site selection that would have been missed had structural complexity or total vegetation density been considered alone. 5. Our results also demonstrated that interactions between predator foraging success and habitat structure can be more complex than simple impedance or occlusion by vegetation. ?? 2008 British Ecological Society.
Performance of Optimized Actuator and Sensor Arrays in an Active Noise Control System
NASA Technical Reports Server (NTRS)
Palumbo, D. L.; Padula, S. L.; Lyle, K. H.; Cline, J. H.; Cabell, R. H.
1996-01-01
Experiments have been conducted in NASA Langley's Acoustics and Dynamics Laboratory to determine the effectiveness of optimized actuator/sensor architectures and controller algorithms for active control of harmonic interior noise. Tests were conducted in a large scale fuselage model - a composite cylinder which simulates a commuter class aircraft fuselage with three sections of trim panel and a floor. Using an optimization technique based on the component transfer functions, combinations of 4 out of 8 piezoceramic actuators and 8 out of 462 microphone locations were evaluated against predicted performance. A combinatorial optimization technique called tabu search was employed to select the optimum transducer arrays. Three test frequencies represent the cases of a strong acoustic and strong structural response, a weak acoustic and strong structural response and a strong acoustic and weak structural response. Noise reduction was obtained using a Time Averaged/Gradient Descent (TAGD) controller. Results indicate that the optimization technique successfully predicted best and worst case performance. An enhancement of the TAGD control algorithm was also evaluated. The principal components of the actuator/sensor transfer functions were used in the PC-TAGD controller. The principal components are shown to be independent of each other while providing control as effective as the standard TAGD.
Protein Information Resource: a community resource for expert annotation of protein data
Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy
2001-01-01
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
Design of dual action antibiotics as an approach to search for new promising drugs
NASA Astrophysics Data System (ADS)
Tevyashova, A. N.; Olsufyeva, E. N.; Preobrazhenskaya, M. N.
2015-01-01
The review is devoted to the latest achievements in the design of dual action antibiotics — heterodimeric (chimeric) structures based on antibacterial agents of different classes (fluoroquinolones, anthracyclines, oxazolidines, macrolides and so on). Covalent binding can make the pharmacokinetic characteristics of these molecules more predictable and improve the penetration of each component into the cell. Consequently, not only does the drug efficacy increase owing to inhibition of two targets but also the resistance to one or both antibiotics can be overcome. The theoretical grounds of elaboration, design principles and methods for the synthesis of dual action antibiotics are considered. The structures are classified according to the type of covalent spacer (cleavable or not) connecting the moieties of two agents. Dual action antibiotics with a spacer that can be cleaved in a living cell are considered as dual action prodrugs. Data on the biological action of heterodimeric compounds are presented and structure-activity relationships are analyzed. The bibliography includes 225 references.
ERIC Educational Resources Information Center
Cacioppo, John T.; Semin, Gun R.; Berntson, Gary G.
2004-01-01
Scientific realism holds that scientific theories are approximations of universal truths about reality, whereas scientific instrumentalism posits that scientific theories are intellectual structures that provide adequate predictions of what is observed and useful frameworks for answering questions and solving problems in a given domain. These…
Investigations of Crossed Andreev Reflection in Hybrid Superconductor-Ferromagnet Structures
ERIC Educational Resources Information Center
Colci O'Hara, Madalina
2009-01-01
Cooper pair splitting is predicted to occur in hybrid devices where a superconductor is coupled to two ferromagnetic wires placed at a distance less than the superconducting coherence length. This thesis searches for signatures of this process, called crossed Andreev reflection (CAR), in three device geometries. The first devices studied are…
Sachem: a chemical cartridge for high-performance substructure search.
Kratochvíl, Miroslav; Vondrášek, Jiří; Galgonek, Jakub
2018-05-23
Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
PROXIMAL: a method for Prediction of Xenobiotic Metabolism.
Yousofshahi, Mona; Manteiga, Sara; Wu, Charmian; Lee, Kyongbum; Hassoun, Soha
2015-12-22
Contamination of the environment with bioactive chemicals has emerged as a potential public health risk. These substances that may cause distress or disease in humans can be found in air, water and food supplies. An open question is whether these chemicals transform into potentially more active or toxic derivatives via xenobiotic metabolizing enzymes expressed in the body. We present a new prediction tool, which we call PROXIMAL (Prediction of Xenobiotic Metabolism) for identifying possible transformation products of xenobiotic chemicals in the liver. Using reaction data from DrugBank and KEGG, PROXIMAL builds look-up tables that catalog the sites and types of structural modifications performed by Phase I and Phase II enzymes. Given a compound of interest, PROXIMAL searches for substructures that match the sites cataloged in the look-up tables, applies the corresponding modifications to generate a panel of possible transformation products, and ranks the products based on the activity and abundance of the enzymes involved. PROXIMAL generates transformations that are specific for the chemical of interest by analyzing the chemical's substructures. We evaluate the accuracy of PROXIMAL's predictions through case studies on two environmental chemicals with suspected endocrine disrupting activity, bisphenol A (BPA) and 4-chlorobiphenyl (PCB3). Comparisons with published reports confirm 5 out of 7 and 17 out of 26 of the predicted derivatives for BPA and PCB3, respectively. We also compare biotransformation predictions generated by PROXIMAL with those generated by METEOR and Metaprint2D-react, two other prediction tools. PROXIMAL can predict transformations of chemicals that contain substructures recognizable by human liver enzymes. It also has the ability to rank the predicted metabolites based on the activity and abundance of enzymes involved in xenobiotic transformation.
Planning for robust reserve networks using uncertainty analysis
Moilanen, A.; Runge, M.C.; Elith, Jane; Tyre, A.; Carmel, Y.; Fegraus, E.; Wintle, B.A.; Burgman, M.; Ben-Haim, Y.
2006-01-01
Planning land-use for biodiversity conservation frequently involves computer-assisted reserve selection algorithms. Typically such algorithms operate on matrices of species presence?absence in sites, or on species-specific distributions of model predicted probabilities of occurrence in grid cells. There are practically always errors in input data?erroneous species presence?absence data, structural and parametric uncertainty in predictive habitat models, and lack of correspondence between temporal presence and long-run persistence. Despite these uncertainties, typical reserve selection methods proceed as if there is no uncertainty in the data or models. Having two conservation options of apparently equal biological value, one would prefer the option whose value is relatively insensitive to errors in planning inputs. In this work we show how uncertainty analysis for reserve planning can be implemented within a framework of information-gap decision theory, generating reserve designs that are robust to uncertainty. Consideration of uncertainty involves modifications to the typical objective functions used in reserve selection. Search for robust-optimal reserve structures can still be implemented via typical reserve selection optimization techniques, including stepwise heuristics, integer-programming and stochastic global search.
Scene-aware joint global and local homographic video coding
NASA Astrophysics Data System (ADS)
Peng, Xiulian; Xu, Jizheng; Sullivan, Gary J.
2016-09-01
Perspective motion is commonly represented in video content that is captured and compressed for various applications including cloud gaming, vehicle and aerial monitoring, etc. Existing approaches based on an eight-parameter homography motion model cannot deal with this efficiently, either due to low prediction accuracy or excessive bit rate overhead. In this paper, we consider the camera motion model and scene structure in such video content and propose a joint global and local homography motion coding approach for video with perspective motion. The camera motion is estimated by a computer vision approach, and camera intrinsic and extrinsic parameters are globally coded at the frame level. The scene is modeled as piece-wise planes, and three plane parameters are coded at the block level. Fast gradient-based approaches are employed to search for the plane parameters for each block region. In this way, improved prediction accuracy and low bit costs are achieved. Experimental results based on the HEVC test model show that up to 9.1% bit rate savings can be achieved (with equal PSNR quality) on test video content with perspective motion. Test sequences for the example applications showed a bit rate savings ranging from 3.7 to 9.1%.
A species-specific nucleosomal signature defines a periodic distribution of amino acids in proteins.
Quintales, Luis; Soriano, Ignacio; Vázquez, Enrique; Segurado, Mónica; Antequera, Francisco
2015-04-01
Nucleosomes are the basic structural units of chromatin. Most of the yeast genome is organized in a pattern of positioned nucleosomes that is stably maintained under a wide range of physiological conditions. In this work, we have searched for sequence determinants associated with positioned nucleosomes in four species of fission and budding yeasts. We show that mononucleosomal DNA follows a highly structured base composition pattern, which differs among species despite the high degree of histone conservation. These nucleosomal signatures are present in transcribed and non-transcribed regions across the genome. In the case of open reading frames, they correctly predict the relative distribution of codons on mononucleosomal DNA, and they also determine a periodicity in the average distribution of amino acids along the proteins. These results establish a direct and species-specific connection between the position of each codon around the histone octamer and protein composition.
Prediction and analysis of structure, stability and unfolding of thermolysin-like proteases
NASA Astrophysics Data System (ADS)
Vriend, Gert; Eijsink, Vincent
1993-08-01
Bacillus neutral proteases (NPs) form a group of well-characterized homologous enzymes, that exhibit large differences in thermostability. The three-dimensional (3D) structures of several of these enzymes have been modelled on the basis of the crystal structures of the NPs of B. thermoproteolyticus (thermolysin) and B. cercus. Several new techniques have been developed to improve the model-building procedures. Also a model-building by mutagenesis' strategy was used, in which mutants were designed just to shed light on parts of the structures that were particularly hard to model. The NP models have been used for the prediction of site-directed mutations aimed at improving the thermostability of the enzymes. Predictions were made using several novel computational techniques, such as position-specific rotamer searching, packing quality analysis and property-profile database searches. Many stabilizing mutations were predicted and produced: improvement of hydrogen bonding, exclusion of buried water molecules, capping helices, improvement of hydrophobic interactions and entropic stabilization have been applied successfully. At elevated temperatures NPs are irreversibly inactivated as a result of autolysis. It has been shown that this denaturation process is independent of the protease activity and concentration and that the inactivation follows first-order kinetics. From this it has been conjectured that local unfolding of (surface) loops, which renders the protein susceptible to autolysis, is the rate-limiting step. Despite the particular nature of the thermal denaturation process, normal rules for protein stability can be applied to NPs. However, rather than stabilizing the whole protein against global unfolding, only a small region has to be protected against local unfolding. In contrast to proteins in general, mutational effects in proteases are not additive and their magnitude is strongly dependent on the location of the mutation. Mutations that alter the stability of the NP by a large amount are located in a relatively weak region (or more precisely, they affect a local unfolding pathway with a relatively low free energy of activation). One weak region, that is supposedly important in the early steps of NP unfolding, has been determined in the NP of B. stearothermophilus. After eliminating this weakest link a drastic increase in thermostability was observed and the search for the second-weakest link, or the second-lowest energy local unfolding pathway is now in progress. Hopefully, this approach can be used to unravel the entire early phase of unfolding.
Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences
NASA Astrophysics Data System (ADS)
Adhikari, Aashish N.; Freed, Karl F.; Sosnick, Tobin R.
2013-07-01
We demonstrate the ability of simultaneously determining a protein’s folding pathway and structure using a properly formulated model without prior knowledge of the native structure. Our model employs a natural coordinate system for describing proteins and a search strategy inspired by the observation that real proteins fold in a sequential fashion by incrementally stabilizing nativelike substructures or “foldons.” Comparable folding pathways and structures are obtained for the twelve proteins recently studied using atomistic molecular dynamics simulations [K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, Science 334, 517 (2011)], with our calculations running several orders of magnitude faster. We find that nativelike propensities in the unfolded state do not necessarily determine the order of structure formation, a departure from a major conclusion of the molecular dynamics study. Instead, our results support a more expansive view wherein intrinsic local structural propensities may be enhanced or overridden in the folding process by environmental context. The success of our search strategy validates it as an expedient mechanism for folding both in silico and in vivo.
Structure-Based Characterization of Multiprotein Complexes
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.
2014-01-01
Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
Mining protein database using machine learning techniques.
Camargo, Renata da Silva; Niranjan, Mahesan
2008-08-25
With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous.
We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies.
In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a \\"knowledge gap\\" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.
Liu, Qiaoxia; Zhou, Binbin; Wang, Xinliang; Ke, Yanxiong; Jin, Yu; Yin, Lihui; Liang, Xinmiao
2012-12-01
A search library about benzylisoquinoline alkaloids was established based on preparation of alkaloid fractions from Rhizoma coptidis, Cortex phellodendri, and Rhizoma corydalis. In this work, two alkaloid fractions from each herbal medicine were first prepared based on selective separation on the "click" binaphthyl column. And then these alkaloid fractions were analyzed on C18 column by liquid chromatography coupled with tandem mass spectrometry. Many structure-related compounds were included in these alkaloids fractions, which led to easy separation and good MS response in further work. Therefore, a search library of 52 benzylisoquinoline alkaloids was established, which included eight aporphine, 19 tetrahydroprotoberberine, two protopine, two benzyltetrahydroisoquinoline, and 21 protoberberine alkaloids. The information of the search library contained compound names, structures, retention times, accurate masses, fragmentation pathways of benzylisoquionline alkaloids, and their sources from three herbal medicines. Using such a library, the alkaloids, especially those trace and unknown components in some herbal medicine could be accurately and quickly identified. In addition, the distribution of benzylisoquinoline alkaloids in the herbal medicines could be also summarized by searching the source samples in the library. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Is Internet search better than structured instruction for web-based health education?
Finkelstein, Joseph; Bedra, McKenzie
2013-01-01
Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitley, L. Darrell; Howe, Adele E.; Watson, Jean-Paul
2004-09-01
Tabu search is one of the most effective heuristics for locating high-quality solutions to a diverse array of NP-hard combinatorial optimization problems. Despite the widespread success of tabu search, researchers have a poor understanding of many key theoretical aspects of this algorithm, including models of the high-level run-time dynamics and identification of those search space features that influence problem difficulty. We consider these questions in the context of the job-shop scheduling problem (JSP), a domain where tabu search algorithms have been shown to be remarkably effective. Previously, we demonstrated that the mean distance between random local optima and the nearestmore » optimal solution is highly correlated with problem difficulty for a well-known tabu search algorithm for the JSP introduced by Taillard. In this paper, we discuss various shortcomings of this measure and develop a new model of problem difficulty that corrects these deficiencies. We show that Taillard's algorithm can be modeled with high fidelity as a simple variant of a straightforward random walk. The random walk model accounts for nearly all of the variability in the cost required to locate both optimal and sub-optimal solutions to random JSPs, and provides an explanation for differences in the difficulty of random versus structured JSPs. Finally, we discuss and empirically substantiate two novel predictions regarding tabu search algorithm behavior. First, the method for constructing the initial solution is highly unlikely to impact the performance of tabu search. Second, tabu tenure should be selected to be as small as possible while simultaneously avoiding search stagnation; values larger than necessary lead to significant degradations in performance.« less
Rossa, Carlos; Sloboda, Ron; Usmani, Nawaid; Tavakoli, Mahdi
2016-07-01
This paper proposes a method to predict the deflection of a flexible needle inserted into soft tissue based on the observation of deflection at a single point along the needle shaft. We model the needle-tissue as a discretized structure composed of several virtual, weightless, rigid links connected by virtual helical springs whose stiffness coefficient is found using a pattern search algorithm that only requires the force applied at the needle tip during insertion and the needle deflection measured at an arbitrary insertion depth. Needle tip deflections can then be predicted for different insertion depths. Verification of the proposed method in synthetic and biological tissue shows a deflection estimation error of [Formula: see text]2 mm for images acquired at 35 % or more of the maximum insertion depth, and decreases to 1 mm for images acquired closer to the final insertion depth. We also demonstrate the utility of the model for prostate brachytherapy, where in vivo needle deflection measurements obtained during early stages of insertion are used to predict the needle deflection further along the insertion process. The method can predict needle deflection based on the observation of deflection at a single point. The ultrasound probe can be maintained at the same position during insertion of the needle, which avoids complications of tissue deformation caused by the motion of the ultrasound probe.
Perspective: n-type oxide thermoelectrics via visual search strategies
Xing, Guangzong; Sun, Jifeng; Ong, Khuong P.; ...
2016-02-12
We discuss and present search strategies for finding new thermoelectric compositions based on first principles electronic structure and transport calculations. We illustrate them by application to a search for potential n-type oxide thermoelectric materials. This includes a screen based on visualization of electronic energy isosurfaces. Lastly, we report compounds that show potential as thermoelectric materials along with detailed properties, including SrTiO 3, which is a known thermoelectric, and appropriately doped KNbO 3 and rutile TiO 2.
Perspective: n-type oxide thermoelectrics via visual search strategies
NASA Astrophysics Data System (ADS)
Xing, Guangzong; Sun, Jifeng; Ong, Khuong P.; Fan, Xiaofeng; Zheng, Weitao; Singh, David J.
2016-05-01
We discuss and present search strategies for finding new thermoelectric compositions based on first principles electronic structure and transport calculations. We illustrate them by application to a search for potential n-type oxide thermoelectric materials. This includes a screen based on visualization of electronic energy isosurfaces. We report compounds that show potential as thermoelectric materials along with detailed properties, including SrTiO3, which is a known thermoelectric, and appropriately doped KNbO3 and rutile TiO2.
The DIMA web resource--exploring the protein domain network.
Pagel, Philipp; Oesterheld, Matthias; Stümpflen, Volker; Frishman, Dmitrij
2006-04-15
Conserved domains represent essential building blocks of most known proteins. Owing to their role as modular components carrying out specific functions they form a network based both on functional relations and direct physical interactions. We have previously shown that domain interaction networks provide substantially novel information with respect to networks built on full-length protein chains. In this work we present a comprehensive web resource for exploring the Domain Interaction MAp (DIMA), interactively. The tool aims at integration of multiple data sources and prediction techniques, two of which have been implemented so far: domain phylogenetic profiling and experimentally demonstrated domain contacts from known three-dimensional structures. A powerful yet simple user interface enables the user to compute, visualize, navigate and download domain networks based on specific search criteria. http://mips.gsf.de/genre/proj/dima
The effect of data structures on INGRES performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Creighton, J.R.
1987-01-01
Computer experiments were conducted to determine the effect of using Heap, ISAM, Hash and B-tree data structures for INGRES relations. Average times for retrieve, append and update were determined for searches by unique key and non-key data. The experiments were conducted on relations of approximately 1000 tuples of 332 byte width. Multiple operations were performed, where appropriate, to obtain average times. Simple models of the data structures are presented and shown to be consistent with experimental results. The models can be used to predict performance, and to select the appropriate data structure for various applications.
NASA Astrophysics Data System (ADS)
Kim, Eunae; Jang, Soonmin; Pak, Youngshang
2009-11-01
We performed an all-atom ab initio native structure prediction of 1FME, which is one of the computationally challenging mixed fold ββα miniproteins, by combining a novel conformational search algorithm (multiplexed Q-replica exchange molecular dynamics scheme) with a well-balanced all-atom force field employing a generalized Born implicit solvation model (param99MOD5/GBSA). The nativelike structure of 1FME was identified from the lowest free energy minimum state and in excellent agreement with the NMR structure. Based on the interpretation of the free energy landscape, the structural properties as well as the folding behaviors of 1FME were compared with other ββα miniproteins (1FSD, 1PSV, and BBA5) that we have previously studied with the same force field. Our simulation showed that the 28-residue ββα miniproteins (1FME, 1FSD, and 1PSV) share a common feature of the free energy topography and exhibit the three local minimum states on each computed free energy map, but the 23-residue miniprotein (BBA5) follows a downhill folding with a single minimum state. Also, the structure and stability changes resulting from the two point mutation (Gln1→Glu1 and Ile7→Tyr7) of 1FSD were investigated in details for direct comparison with the experiment. The comparison shows that upon mutation, the experimentally observed turn type switch from an irregular turn (1FSD) to type I' turn (1FME) was well reproduced with the present simulation.
NASA Astrophysics Data System (ADS)
Duan, Yuhua; Zhang, Bo; Sorescu, Dan C.; Johnson, J. Karl; Majzoub, Eric H.; Luebke, David R.
2012-08-01
The structural, electronic, phonon dispersion and thermodynamic properties of MHCO3 (M = Li, Na, K) solids were investigated using density functional theory. The calculated bulk properties for both their ambient and the high-pressure phases are in good agreement with available experimental measurements. Solid phase LiHCO3 has not yet been observed experimentally. We have predicted several possible crystal structures for LiHCO3 using crystallographic database searching and prototype electrostatic ground state modeling. Our total energy and phonon free energy (FPH) calculations predict that LiHCO3 will be stable under suitable conditions of temperature and partial pressures of CO2 and H2O. Our calculations indicate that the {{HCO}}_{3}^{-} groups in LiHCO3 and NaHCO3 form an infinite chain structure through O⋯H⋯O hydrogen bonds. In contrast, the {{HCO}}_{3}^{-} anions form dimers, ({{HCO}}_{3}^{-})_{2}, connected through double hydrogen bonds in all phases of KHCO3. Based on density functional perturbation theory, the Born effective charge tensor of each atom type was obtained for all phases of the bicarbonates. Their phonon dispersions with the longitudinal optical-transverse optical splitting were also investigated. Based on lattice phonon dynamics study, the infrared spectra and the thermodynamic properties of these bicarbonates were obtained. Over the temperature range 0-900 K, the FPH and the entropies (S) of MHCO3 (M =Li, Na, K) systems vary as FPH(LiHCO3) > FPH(NaHCO3) > FPH(KHCO3) and S(KHCO3) > S(NaHCO3) > S(LiHCO3), respectively, in agreement with the available experimental data. Analysis of the predicted thermodynamics of the CO2 capture reactions indicates that the carbonate/bicarbonate transition reactions for Na and K could be used for CO2 capture technology, in agreement with experiments.
A concept of volume rendering guided search process to analyze medical data set.
Zhou, Jianlong; Xiao, Chun; Wang, Zhiyan; Takatsuka, Masahiro
2008-03-01
This paper firstly presents an approach of parallel coordinates based parameter control panel (PCP). The PCP is used to control parameters of focal region-based volume rendering (FRVR) during data analysis. It uses a parallel coordinates style interface. Different rendering parameters represented with nodes on each axis, and renditions based on related parameters are connected using polylines to show dependencies between renditions and parameters. Based on the PCP, a concept of volume rendering guided search process is proposed. The search pipeline is divided into four phases. Different parameters of FRVR are recorded and modulated in the PCP during search phases. The concept shows that volume visualization could play the role of guiding a search process in the rendition space to help users to efficiently find local structures of interest. The usability of the proposed approach is evaluated to show its effectiveness.
The burden of typhoid fever in low- and middle-income countries: A meta-regression approach.
Antillón, Marina; Warren, Joshua L; Crawford, Forrest W; Weinberger, Daniel M; Kürüm, Esra; Pak, Gi Deok; Marks, Florian; Pitzer, Virginia E
2017-02-01
Upcoming vaccination efforts against typhoid fever require an assessment of the baseline burden of disease in countries at risk. There are no typhoid incidence data from most low- and middle-income countries (LMICs), so model-based estimates offer insights for decision-makers in the absence of readily available data. We developed a mixed-effects model fit to data from 32 population-based studies of typhoid incidence in 22 locations in 14 countries. We tested the contribution of economic and environmental indices for predicting typhoid incidence using a stochastic search variable selection algorithm. We performed out-of-sample validation to assess the predictive performance of the model. We estimated that 17.8 million cases of typhoid fever occur each year in LMICs (95% credible interval: 6.9-48.4 million). Central Africa was predicted to experience the highest incidence of typhoid, followed by select countries in Central, South, and Southeast Asia. Incidence typically peaked in the 2-4 year old age group. Models incorporating widely available economic and environmental indicators were found to describe incidence better than null models. Recent estimates of typhoid burden may under-estimate the number of cases and magnitude of uncertainty in typhoid incidence. Our analysis permits prediction of overall as well as age-specific incidence of typhoid fever in LMICs, and incorporates uncertainty around the model structure and estimates of the predictors. Future studies are needed to further validate and refine model predictions and better understand year-to-year variation in cases.
Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags
NASA Astrophysics Data System (ADS)
Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji
We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding
NASA Astrophysics Data System (ADS)
Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae
2017-12-01
High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
The ecology of parasites of freshwater fishes: the search for patterns.
Kennedy, C R
2009-10-01
Developments in the study of the ecology of helminth parasites of freshwater fishes over the last half century are reviewed. Most research has of necessity been field based and has involved the search for patterns in population and community dynamics that are repeatable in space and time. Mathematical models predict that under certain conditions host and parasite populations can attain equilibrial levels through operation of regulatory factors. Such factors have been identified in several host-parasite systems and some parasite populations have been shown to persist over long time-periods. However, there is no convincing evidence that fish parasite populations are stable and regulated since in all cases alternative explanations are equally acceptable and it appears that they are non-equilibrial systems. It has proved particularly difficult to detect replicable patterns in parasite communities. Inter-specific competition, evidenced by functional and numerical responses, has been detected in several communities but its occurrence is erratic and its significance unclear. Some studies have failed to find any nested patterns in parasite community structure and richness, whereas others have identified such patterns although they are seldom constant over space and time. Departures from randomness appear to be the exception and then only temporary. It appears that parasite communities are non-equilibrial, stochastic assemblages rather than structured and organized.
Schomburg, Ida; Chang, Antje; Placzek, Sandra; Söhngen, Carola; Rother, Michael; Lang, Maren; Munaretto, Cornelia; Ulas, Susanne; Stelzer, Michael; Grote, Andreas; Scheer, Maurice; Schomburg, Dietmar
2013-01-01
The BRENDA (BRaunschweig ENzyme DAtabase) enzyme portal (http://www.brenda-enzymes.org) is the main information system of functional biochemical and molecular enzyme data and provides access to seven interconnected databases. BRENDA contains 2.7 million manually annotated data on enzyme occurrence, function, kinetics and molecular properties. Each entry is connected to a reference and the source organism. Enzyme ligands are stored with their structures and can be accessed via their names, synonyms or via a structure search. FRENDA (Full Reference ENzyme DAta) and AMENDA (Automatic Mining of ENzyme DAta) are based on text mining methods and represent a complete survey of PubMed abstracts with information on enzymes in different organisms, tissues or organelles. The supplemental database DRENDA provides more than 910 000 new EC number-disease relations in more than 510 000 references from automatic search and a classification of enzyme-disease-related information. KENDA (Kinetic ENzyme DAta), a new amendment extracts and displays kinetic values from PubMed abstracts. The integration of the EnzymeDetector offers an automatic comparison, evaluation and prediction of enzyme function annotations for prokaryotic genomes. The biochemical reaction database BKM-react contains non-redundant enzyme-catalysed and spontaneous reactions and was developed to facilitate and accelerate the construction of biochemical models.
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
Precise predictions for V+jets dark matter backgrounds
NASA Astrophysics Data System (ADS)
Lindert, J. M.; Pozzorini, S.; Boughezal, R.; Campbell, J. M.; Denner, A.; Dittmaier, S.; Gehrmann-De Ridder, A.; Gehrmann, T.; Glover, N.; Huss, A.; Kallweit, S.; Maierhöfer, P.; Mangano, M. L.; Morgan, T. A.; Mück, A.; Petriello, F.; Salam, G. P.; Schönherr, M.; Williams, C.
2017-12-01
High-energy jets recoiling against missing transverse energy (MET) are powerful probes of dark matter at the LHC. Searches based on large MET signatures require a precise control of the Z(ν {\\bar{ν }})+ jet background in the signal region. This can be achieved by taking accurate data in control regions dominated by Z(ℓ ^+ℓ ^-)+ jet, W(ℓ ν )+ jet and γ + jet production, and extrapolating to the Z(ν {\\bar{ν }})+ jet background by means of precise theoretical predictions. In this context, recent advances in perturbative calculations open the door to significant sensitivity improvements in dark matter searches. In this spirit, we present a combination of state-of-the-art calculations for all relevant V+ jets processes, including throughout NNLO QCD corrections and NLO electroweak corrections supplemented by Sudakov logarithms at two loops. Predictions at parton level are provided together with detailed recommendations for their usage in experimental analyses based on the reweighting of Monte Carlo samples. Particular attention is devoted to the estimate of theoretical uncertainties in the framework of dark matter searches, where subtle aspects such as correlations across different V+ jet processes play a key role. The anticipated theoretical uncertainty in the Z(ν {\\bar{ν }})+ jet background is at the few percent level up to the TeV range.
Risk Prediction Models in Psychiatry: Toward a New Frontier for the Prevention of Mental Illnesses.
Bernardini, Francesco; Attademo, Luigi; Cleary, Sean D; Luther, Charles; Shim, Ruth S; Quartesan, Roberto; Compton, Michael T
2017-05-01
We conducted a systematic, qualitative review of risk prediction models designed and tested for depression, bipolar disorder, generalized anxiety disorder, posttraumatic stress disorder, and psychotic disorders. Our aim was to understand the current state of research on risk prediction models for these 5 disorders and thus future directions as our field moves toward embracing prediction and prevention. Systematic searches of the entire MEDLINE electronic database were conducted independently by 2 of the authors (from 1960 through 2013) in July 2014 using defined search criteria. Search terms included risk prediction, predictive model, or prediction model combined with depression, bipolar, manic depressive, generalized anxiety, posttraumatic, PTSD, schizophrenia, or psychosis. We identified 268 articles based on the search terms and 3 criteria: published in English, provided empirical data (as opposed to review articles), and presented results pertaining to developing or validating a risk prediction model in which the outcome was the diagnosis of 1 of the 5 aforementioned mental illnesses. We selected 43 original research reports as a final set of articles to be qualitatively reviewed. The 2 independent reviewers abstracted 3 types of data (sample characteristics, variables included in the model, and reported model statistics) and reached consensus regarding any discrepant abstracted information. Twelve reports described models developed for prediction of major depressive disorder, 1 for bipolar disorder, 2 for generalized anxiety disorder, 4 for posttraumatic stress disorder, and 24 for psychotic disorders. Most studies reported on sensitivity, specificity, positive predictive value, negative predictive value, and area under the (receiver operating characteristic) curve. Recent studies demonstrate the feasibility of developing risk prediction models for psychiatric disorders (especially psychotic disorders). The field must now advance by (1) conducting more large-scale, longitudinal studies pertaining to depression, bipolar disorder, anxiety disorders, and other psychiatric illnesses; (2) replicating and carrying out external validations of proposed models; (3) further testing potential selective and indicated preventive interventions; and (4) evaluating effectiveness of such interventions in the context of risk stratification using risk prediction models. © Copyright 2017 Physicians Postgraduate Press, Inc.
Riecher-Rössler, A; Aston, J; Ventura, J; Merlo, M; Borgwardt, S; Gschwandtner, U; Stieglitz, R-D
2008-04-01
Early detection of psychosis is of growing clinical importance. So far there is, however, no screening instrument for detecting individuals with beginning psychosis in the atypical early stages of the disease with sufficient validity. We have therefore developed the Basel Screening Instrument for Psychosis (BSIP) and tested its feasibility, interrater-reliability and validity. Aim of this paper is to describe the development and structure of the instrument, as well as to report the results of the studies on reliability and validity. The instrument was developed based on a comprehensive search of literature on the most important risk factors and early signs of schizophrenic psychoses. The interraterreliability study was conducted on 24 psychiatric cases. Validity was tested based on 206 individuals referred to our early detection clinic from 3/1/2000 until 2/28/2003. We identified seven categories of relevance for early detection of psychosis and used them to construct a semistructured interview. Interrater-reliability for high risk individuals was high (Kappa .87). Predictive validity was comparable to other, more comprehensive instruments: 16 (32 %) of 50 individuals classified as being at risk for psychosis by the BSIP have in fact developed frank psychosis within an follow-up period of two to five years. The BSIP is the first screening instrument for the early detection of psychosis which has been validated based on transition to psychosis. The BSIP is easy to use by experienced psychiatrists and has a very good interrater-reliability and predictive validity.
Contextual remapping in visual search after predictable target-location changes.
Conci, Markus; Sun, Luning; Müller, Hermann J
2011-07-01
Invariant spatial context can facilitate visual search. For instance, detection of a target is faster if it is presented within a repeatedly encountered, as compared to a novel, layout of nontargets, demonstrating a role of contextual learning for attentional guidance ('contextual cueing'). Here, we investigated how context-based learning adapts to target location (and identity) changes. Three experiments were performed in which, in an initial learning phase, observers learned to associate a given context with a given target location. A subsequent test phase then introduced identity and/or location changes to the target. The results showed that contextual cueing could not compensate for target changes that were not 'predictable' (i.e. learnable). However, for predictable changes, contextual cueing remained effective even immediately after the change. These findings demonstrate that contextual cueing is adaptive to predictable target location changes. Under these conditions, learned contextual associations can be effectively 'remapped' to accommodate new task requirements.