structure prediction algorithms: Topics by Science.gov

Sample records for structure prediction algorithms

A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

PubMed

Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

2017-10-01

The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
3D Protein structure prediction with genetic tabu search algorithm

PubMed Central

2010-01-01

Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
An O(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids.

PubMed

Chen, Ho-Lin; Condon, Anne; Jabbari, Hosna

2009-06-01

Efficient methods for prediction of minimum free energy (MFE) nucleic secondary structures are widely used, both to better understand structure and function of biological RNAs and to design novel nano-structures. Here, we present a new algorithm for MFE secondary structure prediction, which significantly expands the class of structures that can be handled in O(n(5)) time. Our algorithm can handle H-type pseudoknotted structures, kissing hairpins, and chains of four overlapping stems, as well as nested substructures of these types.
RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

PubMed

Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

2018-01-01

RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.
Development of an Evolutionary Algorithm for the ab Initio Discovery of Two-Dimensional Materials

NASA Astrophysics Data System (ADS)

Revard, Benjamin Charles

Crystal structure prediction is an important first step on the path toward computational materials design. Increasingly robust methods have become available in recent years for computing many materials properties, but because properties are largely a function of crystal structure, the structure must be known before these methods can be brought to bear. In addition, structure prediction is particularly useful for identifying low-energy structures of subperiodic materials, such as two-dimensional (2D) materials, which may adopt unexpected structures that differ from those of the corresponding bulk phases. Evolutionary algorithms, which are heuristics for global optimization inspired by biological evolution, have proven to be a fruitful approach for tackling the problem of crystal structure prediction. This thesis describes the development of an improved evolutionary algorithm for structure prediction and several applications of the algorithm to predict the structures of novel low-energy 2D materials. The first part of this thesis contains an overview of evolutionary algorithms for crystal structure prediction and presents our implementation, including details of extending the algorithm to search for clusters, wires, and 2D materials, improvements to efficiency when running in parallel, improved composition space sampling, and the ability to search for partial phase diagrams. We then present several applications of the evolutionary algorithm to 2D systems, including InP, the C-Si and Sn-S phase diagrams, and several group-IV dioxides. This thesis makes use of the Cornell graduate school's "papers" option. Chapters 1 and 3 correspond to the first-author publications of Refs. [131] and [132], respectively, and chapter 2 will soon be submitted as a first-author publication. The material in chapter 4 is taken from Ref. [144], in which I share joint first-authorship. In this case I have included only my own contributions.
High-speed prediction of crystal structures for organic molecules

NASA Astrophysics Data System (ADS)

Obata, Shigeaki; Goto, Hitoshi

2015-02-01

We developed a master-worker type parallel algorithm for allocating tasks of crystal structure optimizations to distributed compute nodes, in order to improve a performance of simulations for crystal structure predictions. The performance experiments were demonstrated on TUT-ADSIM supercomputer system (HITACHI HA8000-tc/HT210). The experimental results show that our parallel algorithm could achieve speed-ups of 214 and 179 times using 256 processor cores on crystal structure optimizations in predictions of crystal structures for 3-aza-bicyclo(3.3.1)nonane-2,4-dione and 2-diazo-3,5-cyclohexadiene-1-one, respectively. We expect that this parallel algorithm is always possible to reduce computational costs of any crystal structure predictions.
A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation.

PubMed

Layton, D M; Bundschuh, R

2005-01-01

Computational RNA secondary structure prediction is rather well established. However, such prediction algorithms always depend on a large number of experimentally measured parameters. Here, we study how sensitive structure prediction algorithms are to changes in these parameters. We found already that for changes corresponding to the actual experimental error to which these parameters have been determined, 30% of the structure are falsely predicted whereas the ground state structure is preserved under parameter perturbation in only 5% of all the cases. We establish that base-pairing probabilities calculated in a thermal ensemble are viable although not a perfect measure for the reliability of the prediction of individual structure elements. Here, a new measure of stability using parameter perturbation is proposed, and its limitations are discussed.
An improved stochastic fractal search algorithm for 3D protein structure prediction.

PubMed

Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun

2018-05-03

Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Protein Structure Prediction with Evolutionary Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hart, W.E.; Krasnogor, N.; Pelta, D.A.

1999-02-08

Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

PubMed

Ni, Qianwu; Chen, Lei

2017-01-01

Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Improved hybrid optimization algorithm for 3D protein structure prediction.

PubMed

Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

2014-07-01

A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

PubMed Central

Reeder, Jens; Giegerich, Robert

2004-01-01

Background The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program. Results We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. Conclusions RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm. PMID:15294028
RNA design using simulated SHAPE data.

PubMed

Lotfi, Mohadeseh; Zare-Mirakabad, Fatemeh; Montaseri, Soheila

2018-05-03

It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Learning to Predict Combinatorial Structures

NASA Astrophysics Data System (ADS)

Vembu, Shankar

2009-12-01

The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
XtalOpt version r9: An open-source evolutionary algorithm for crystal structure prediction

DOE PAGES

Falls, Zackary; Lonie, David C.; Avery, Patrick; ...

2015-10-23

This is a new version of XtalOpt, an evolutionary algorithm for crystal structure prediction available for download from the CPC library or the XtalOpt website, http://xtalopt.github.io. XtalOpt is published under the Gnu Public License (GPL), which is an open source license that is recognized by the Open Source Initiative. We have detailed the new version incorporates many bug-fixes and new features here and predict the crystal structure of a system from its stoichiometry alone, using evolutionary algorithms.
Firefly Algorithm for Structural Search.

PubMed

Avendaño-Franco, Guillermo; Romero, Aldo H

2016-07-12

The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers.
Accelerated probabilistic inference of RNA structure evolution

PubMed Central

Holmes, Ian

2005-01-01

Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
A parallel strategy for predicting the secondary structure of polycistronic microRNAs.

PubMed

Han, Dianwei; Tang, Guiliang; Zhang, Jun

2013-01-01

The biogenesis of a functional microRNA is largely dependent on the secondary structure of the microRNA precursor (pre-miRNA). Recently, it has been shown that microRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design efficient computational methods to predict such structures for microRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. We conducted some experiments to verify the effectiveness of our parallel algorithm. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs.
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

PubMed

Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

2012-01-01

Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.
Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.

PubMed

Will, Sebastian; Jabbari, Hosna

2016-01-01

RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.uni-leipzig.de/~will/Software/SparseMFEFold.

RNA-SSPT: RNA Secondary Structure Prediction Tools.

PubMed

Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

2013-01-01

The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.
RNA-SSPT: RNA Secondary Structure Prediction Tools

PubMed Central

Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

2013-01-01

The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes. PMID:24250115
Beam-column joint shear prediction using hybridized deep learning neural network with genetic algorithm

NASA Astrophysics Data System (ADS)

Mundher Yaseen, Zaher; Abdulmohsin Afan, Haitham; Tran, Minh-Tung

2018-04-01

Scientifically evidenced that beam-column joints are a critical point in the reinforced concrete (RC) structure under the fluctuation loads effects. In this novel hybrid data-intelligence model developed to predict the joint shear behavior of exterior beam-column structure frame. The hybrid data-intelligence model is called genetic algorithm integrated with deep learning neural network model (GA-DLNN). The genetic algorithm is used as prior modelling phase for the input approximation whereas the DLNN predictive model is used for the prediction phase. To demonstrate this structural problem, experimental data is collected from the literature that defined the dimensional and specimens’ properties. The attained findings evidenced the efficitveness of the hybrid GA-DLNN in modelling beam-column joint shear problem. In addition, the accurate prediction achived with less input variables owing to the feasibility of the evolutionary phase.
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

PubMed Central

2012-01-01

Background Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. Results In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Conclusions Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications. PMID:22369626
De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

PubMed

Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

2011-08-01

Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.
Predicting patchy particle crystals: variable box shape simulations and evolutionary algorithms.

PubMed

Bianchi, Emanuela; Doppelbauer, Günther; Filion, Laura; Dijkstra, Marjolein; Kahl, Gerhard

2012-06-07

We consider several patchy particle models that have been proposed in literature and we investigate their candidate crystal structures in a systematic way. We compare two different algorithms for predicting crystal structures: (i) an approach based on Monte Carlo simulations in the isobaric-isothermal ensemble and (ii) an optimization technique based on ideas of evolutionary algorithms. We show that the two methods are equally successful and provide consistent results on crystalline phases of patchy particle systems.
Revealing how network structure affects accuracy of link prediction

NASA Astrophysics Data System (ADS)

Yang, Jin-Xuan; Zhang, Xiao-Dong

2017-08-01

Link prediction plays an important role in network reconstruction and network evolution. The network structure affects the accuracy of link prediction, which is an interesting problem. In this paper we use common neighbors and the Gini coefficient to reveal the relation between them, which can provide a good reference for the choice of a suitable link prediction algorithm according to the network structure. Moreover, the statistical analysis reveals correlation between the common neighbors index, Gini coefficient index and other indices to describe the network structure, such as Laplacian eigenvalues, clustering coefficient, degree heterogeneity, and assortativity of network. Furthermore, a new method to predict missing links is proposed. The experimental results show that the proposed algorithm yields better prediction accuracy and robustness to the network structure than existing currently used methods for a variety of real-world networks.
Cyclic coordinate descent: A robotics algorithm for protein loop closure.

PubMed

Canutescu, Adrian A; Dunbrack, Roland L

2003-05-01

In protein structure prediction, it is often the case that a protein segment must be adjusted to connect two fixed segments. This occurs during loop structure prediction in homology modeling as well as in ab initio structure prediction. Several algorithms for this purpose are based on the inverse Jacobian of the distance constraints with respect to dihedral angle degrees of freedom. These algorithms are sometimes unstable and fail to converge. We present an algorithm developed originally for inverse kinematics applications in robotics. In robotics, an end effector in the form of a robot hand must reach for an object in space by altering adjustable joint angles and arm lengths. In loop prediction, dihedral angles must be adjusted to move the C-terminal residue of a segment to superimpose on a fixed anchor residue in the protein structure. The algorithm, referred to as cyclic coordinate descent or CCD, involves adjusting one dihedral angle at a time to minimize the sum of the squared distances between three backbone atoms of the moving C-terminal anchor and the corresponding atoms in the fixed C-terminal anchor. The result is an equation in one variable for the proposed change in each dihedral. The algorithm proceeds iteratively through all of the adjustable dihedral angles from the N-terminal to the C-terminal end of the loop. CCD is suitable as a component of loop prediction methods that generate large numbers of trial structures. It succeeds in closing loops in a large test set 99.79% of the time, and fails occasionally only for short, highly extended loops. It is very fast, closing loops of length 8 in 0.037 sec on average.
Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

PubMed

Chira, Camelia; Horvath, Dragos; Dumitrescu, D

2011-07-30

Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

PubMed

Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

2007-02-15

Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Crystal-structure prediction via the Floppy-Box Monte Carlo algorithm: Method and application to hard (non)convex particles

NASA Astrophysics Data System (ADS)

de Graaf, Joost; Filion, Laura; Marechal, Matthieu; van Roij, René; Dijkstra, Marjolein

2012-12-01

In this paper, we describe the way to set up the floppy-box Monte Carlo (FBMC) method [L. Filion, M. Marechal, B. van Oorschot, D. Pelt, F. Smallenburg, and M. Dijkstra, Phys. Rev. Lett. 103, 188302 (2009), 10.1103/PhysRevLett.103.188302] to predict crystal-structure candidates for colloidal particles. The algorithm is explained in detail to ensure that it can be straightforwardly implemented on the basis of this text. The handling of hard-particle interactions in the FBMC algorithm is given special attention, as (soft) short-range and semi-long-range interactions can be treated in an analogous way. We also discuss two types of algorithms for checking for overlaps between polyhedra, the method of separating axes and a triangular-tessellation based technique. These can be combined with the FBMC method to enable crystal-structure prediction for systems composed of highly shape-anisotropic particles. Moreover, we present the results for the dense crystal structures predicted using the FBMC method for 159 (non)convex faceted particles, on which the findings in [J. de Graaf, R. van Roij, and M. Dijkstra, Phys. Rev. Lett. 107, 155501 (2011), 10.1103/PhysRevLett.107.155501] were based. Finally, we comment on the process of crystal-structure prediction itself and the choices that can be made in these simulations.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

PubMed Central

Chen, Jui-Le; Yang, Chu-Sing

2013-01-01

The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

PubMed

Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

2014-03-25

A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

PubMed Central

Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

2016-01-01

Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851
A genetic algorithm approach in interface and surface structure optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jian

The thesis is divided into two parts. In the first part a global optimization method is developed for the interface and surface structures optimization. Two prototype systems are chosen to be studied. One is Si[001] symmetric tilted grain boundaries and the other is Ag/Au induced Si(111) surface. It is found that Genetic Algorithm is very efficient in finding lowest energy structures in both cases. Not only existing structures in the experiments can be reproduced, but also many new structures can be predicted using Genetic Algorithm. Thus it is shown that Genetic Algorithm is a extremely powerful tool for the materialmore » structures predictions. The second part of the thesis is devoted to the explanation of an experimental observation of thermal radiation from three-dimensional tungsten photonic crystal structures. The experimental results seems astounding and confusing, yet the theoretical models in the paper revealed the physics insight behind the phenomena and can well reproduced the experimental results.« less
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs

PubMed Central

2017-01-01

Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods

NASA Astrophysics Data System (ADS)

Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.

2017-12-01

Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers.

PubMed

Cao, Han; Ng, Marcus C K; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W I

2017-09-01

[Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM . Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

NASA Astrophysics Data System (ADS)

Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

2017-09-01

α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM. Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
GRID: a high-resolution protein structure refinement algorithm.

PubMed

Chitsaz, Mohsen; Mayo, Stephen L

2013-03-05

The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. Copyright © 2012 Wiley Periodicals, Inc.

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

PubMed

Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2017-08-01

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Refined Genetic Algorithms for Polypeptide Structure Prediction.

DTIC Science & Technology

1996-12-01

16 I I I. Algorithm Analysis, Design , and Implemen tation : : : : : : : : : : : : : : : : : : : : : : : : : 18 3.1 Analysis...21 3.2 Algorithm Design and Implemen tation : : : : : : : : : : : : : : : : : : : : : : : : : 22 3.2.1...26 IV. Exp erimen t Design
Multiobjective evolutionary algorithm with many tables for purely ab initio protein structure prediction.

PubMed

Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso

2013-07-30

This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.
Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

PubMed

Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

2018-01-01

T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.
Ensemble-based prediction of RNA secondary structures.

PubMed

Aghaeepour, Nima; Hoos, Holger H

2013-04-24

Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network

NASA Astrophysics Data System (ADS)

Wang, Zhen-yu; Zhang, Li-jie

2017-10-01

Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s
Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

PubMed

Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

2011-08-01

The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.
Predicting Protein Structure Using Parallel Genetic Algorithms.

DTIC Science & Technology

1994-12-01

Molecular dynamics attempts to simulate the protein folding process. However, the time steps required for this simulation are on the order of one...harmonics. These two factors have limited molecular dynamics simulations to less than a few nanoseconds (10-9 sec), even on today’s fastest supercomputers...By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H
A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis.

PubMed

Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati

2010-12-20

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.
Analysis of energy-based algorithms for RNA secondary structure prediction

PubMed Central

2012-01-01

Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803
Analysis of energy-based algorithms for RNA secondary structure prediction.

PubMed

Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

2012-02-01

RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.
Fast prediction of RNA-RNA interaction using heuristic algorithm.

PubMed

Montaseri, Soheila

2015-01-01

Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
The Design and Implementation of a Read Prediction Buffer

DTIC Science & Technology

1992-12-01

City, State, and ZIP Code) 7b ADDRESS (City, State. and ZIP Code) 8a. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT... 9 E. THESIS STRUCTURE.. . .... ............... 9 II. READ PREDICTION ALGORITHM AND BUFFER DESIGN 10 A. THE READ PREDICTION ALGORITHM...29 Figure 9 . Basic Multiplexer Cell .... .......... .. 30 Figure 10. Block Diagram Simulation Labels ......... 38 viii I. INTRODUCTION A
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

PubMed

Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

2015-01-01

Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Critical Features of Fragment Libraries for Protein Structure Prediction

PubMed Central

dos Santos, Karina Baptista

2017-01-01

The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928
Critical Features of Fragment Libraries for Protein Structure Prediction.

PubMed

Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

2017-01-01

The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Data-directed RNA secondary structure prediction using probabilistic modeling

PubMed Central

Deng, Fei; Ledda, Mirko; Vaziri, Sana; Aviran, Sharon

2016-01-01

Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations. PMID:27251549
RNA secondary structure prediction using soft computing.

PubMed

Ray, Shubhra Sankar; Pal, Sankar K

2013-01-01

Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

PubMed

Ogorzalek, Tadeusz L; Hura, Greg L; Belsom, Adam; Burnett, Kathryn H; Kryshtafovych, Andriy; Tainer, John A; Rappsilber, Juri; Tsutakawa, Susan E; Fidelis, Krzysztof

2018-03-01

Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. © 2018 Wiley Periodicals, Inc.
Establishing a Dynamic Self-Adaptation Learning Algorithm of the BP Neural Network and Its Applications

NASA Astrophysics Data System (ADS)

Li, Xiaofeng; Xiang, Suying; Zhu, Pengfei; Wu, Min

2015-12-01

In order to avoid the inherent deficiencies of the traditional BP neural network, such as slow convergence speed, that easily leading to local minima, poor generalization ability and difficulty in determining the network structure, the dynamic self-adaptive learning algorithm of the BP neural network is put forward to improve the function of the BP neural network. The new algorithm combines the merit of principal component analysis, particle swarm optimization, correlation analysis and self-adaptive model, hence can effectively solve the problems of selecting structural parameters, initial connection weights and thresholds and learning rates of the BP neural network. This new algorithm not only reduces the human intervention, optimizes the topological structures of BP neural networks and improves the network generalization ability, but also accelerates the convergence speed of a network, avoids trapping into local minima, and enhances network adaptation ability and prediction ability. The dynamic self-adaptive learning algorithm of the BP neural network is used to forecast the total retail sale of consumer goods of Sichuan Province, China. Empirical results indicate that the new algorithm is superior to the traditional BP network algorithm in predicting accuracy and time consumption, which shows the feasibility and effectiveness of the new algorithm.

Predicting protein structures with a multiplayer online game.

PubMed

Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

2010-08-05

People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Automatic measurement of voice onset time using discriminative structured prediction.

PubMed

Sonderegger, Morgan; Keshet, Joseph

2012-12-01

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.
Predicting missing links and identifying spurious links via likelihood analysis

NASA Astrophysics Data System (ADS)

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-03-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms.
Predicting missing links and identifying spurious links via likelihood analysis

PubMed Central

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-01-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms. PMID:26961965
Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement

PubMed Central

Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang

2011-01-01

I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036
Multicore and GPU algorithms for Nussinov RNA folding

PubMed Central

2014-01-01

Background One segment of a RNA sequence might be paired with another segment of the same RNA sequence due to the force of hydrogen bonds. This two-dimensional structure is called the RNA sequence's secondary structure. Several algorithms have been proposed to predict an RNA sequence's secondary structure. These algorithms are referred to as RNA folding algorithms. Results We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's algorithm. Conclusions Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded CPU. Our GPU algorithm for the NVIDIA C2050 is up to 1582 times as fast as the naive single core algorithm and between 5.1 and 11.2 times as fast as the fastest previously known GPU algorithm for Nussinov RNA folding. PMID:25082539
I-TASSER: fully automated protein structure prediction in CASP8.

PubMed

Zhang, Yang

2009-01-01

The I-TASSER algorithm for 3D protein structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but the human predictions incorporate more diverse templates from other servers which improve the human predictions in some of the distant homology targets. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the accuracy of the sequence based contact predictions is on average lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing in these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions. Copyright 2009 Wiley-Liss, Inc.
Community detection in complex networks using link prediction

NASA Astrophysics Data System (ADS)

Cheng, Hui-Min; Ning, Yi-Zi; Yin, Zhao; Yan, Chao; Liu, Xin; Zhang, Zhong-Yuan

2018-01-01

Community detection and link prediction are both of great significance in network analysis, which provide very valuable insights into topological structures of the network from different perspectives. In this paper, we propose a novel community detection algorithm with inclusion of link prediction, motivated by the question whether link prediction can be devoted to improving the accuracy of community partition. For link prediction, we propose two novel indices to compute the similarity between each pair of nodes, one of which aims to add missing links, and the other tries to remove spurious edges. Extensive experiments are conducted on benchmark data sets, and the results of our proposed algorithm are compared with two classes of baselines. In conclusion, our proposed algorithm is competitive, revealing that link prediction does improve the precision of community detection.
A test to evaluate the earthquake prediction algorithm, M8

USGS Publications Warehouse

Healy, John H.; Kossobokov, Vladimir G.; Dewey, James W.

1992-01-01

A test of the algorithm M8 is described. The test is constructed to meet four rules, which we propose to be applicable to the test of any method for earthquake prediction: 1. An earthquake prediction technique should be presented as a well documented, logical algorithm that can be used by investigators without restrictions. 2. The algorithm should be coded in a common programming language and implementable on widely available computer systems. 3. A test of the earthquake prediction technique should involve future predictions with a black box version of the algorithm in which potentially adjustable parameters are fixed in advance. The source of the input data must be defined and ambiguities in these data must be resolved automatically by the algorithm. 4. At least one reasonable null hypothesis should be stated in advance of testing the earthquake prediction method, and it should be stated how this null hypothesis will be used to estimate the statistical significance of the earthquake predictions. The M8 algorithm has successfully predicted several destructive earthquakes, in the sense that the earthquakes occurred inside regions with linear dimensions from 384 to 854 km that the algorithm had identified as being in times of increased probability for strong earthquakes. In addition, M8 has successfully "post predicted" high percentages of strong earthquakes in regions to which it has been applied in retroactive studies. The statistical significance of previous predictions has not been established, however, and post-prediction studies in general are notoriously subject to success-enhancement through hindsight. Nor has it been determined how much more precise an M8 prediction might be than forecasts and probability-of-occurrence estimates made by other techniques. We view our test of M8 both as a means to better determine the effectiveness of M8 and as an experimental structure within which to make observations that might lead to improvements in the algorithm or conceivably lead to a radically different approach to earthquake prediction.
“Skin-Core-Skin” Structure of Polymer Crystallization Investigated by Multiscale Simulation

PubMed Central

Ruan, Chunlei

2018-01-01

“Skin-core-skin” structure is a typical crystal morphology in injection products. Previous numerical works have rarely focused on crystal evolution; rather, they have mostly been based on the prediction of temperature distribution or crystallization kinetics. The aim of this work was to achieve the “skin-core-skin” structure and investigate the role of external flow and temperature fields on crystal morphology. Therefore, the multiscale algorithm was extended to the simulation of polymer crystallization in a pipe flow. The multiscale algorithm contains two parts: a collocated finite volume method at the macroscopic level and a morphological Monte Carlo method at the microscopic level. The SIMPLE (semi-implicit method for pressure linked equations) algorithm was used to calculate the polymeric model at the macroscopic level, while the Monte Carlo method with stochastic birth-growth process of spherulites and shish-kebabs was used at the microscopic level. Results show that our algorithm is valid to predict “skin-core-skin” structure, and the initial melt temperature and the maximum velocity of melt at the inlet mainly affects the morphology of shish-kebabs. PMID:29659516
A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).

PubMed

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach.

PubMed

Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal

2011-08-18

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach

PubMed Central

2011-01-01

Background RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. Results We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. Conclusions The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms. PMID:21851589
Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures.

PubMed

Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana

2016-01-01

With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.
A material political economy: Automated Trading Desk and price prediction in high-frequency trading.

PubMed

MacKenzie, Donald

2017-04-01

This article contains the first detailed historical study of one of the new high-frequency trading (HFT) firms that have transformed many of the world's financial markets. The study, of Automated Trading Desk (ATD), one of the earliest and most important such firms, focuses on how ATD's algorithms predicted share price changes. The article argues that political-economic struggles are integral to the existence of some of the 'pockets' of predictable structure in the otherwise random movements of prices, to the availability of the data that allow algorithms to identify these pockets, and to the capacity of algorithms to use these predictions to trade profitably. The article also examines the role of HFT algorithms such as ATD's in the epochal, fiercely contested shift in US share trading from 'fixed-role' markets towards 'all-to-all' markets.
Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

PubMed

Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

2015-01-01

Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

NASA Astrophysics Data System (ADS)

Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.
Predicting surface fuel models and fuel metrics using lidar and CIR imagery in a dense mixed conifer forest

Treesearch

Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly

2013-01-01

We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.

PubMed

Legendre, Audrey; Angel, Eric; Tahi, Fariza

2018-01-15

RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure. We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic bi-objective integer programming algorithm allowing to return optimal and sub-optimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F 1 -score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F 1 -scores are always higher than 70% for any number of solutions returned. The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several sub-optimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better mono-criterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multi-objective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univ-evry.fr .
Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots.

PubMed

Hajdin, Christine E; Bellaousov, Stanislav; Huggins, Wayne; Leonard, Christopher W; Mathews, David H; Weeks, Kevin M

2013-04-02

A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in well-folded RNAs were identified.

A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)

PubMed Central

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585
Foraging on the potential energy surface: a swarm intelligence-based optimizer for molecular geometry.

PubMed

Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D; Sebastiani, Daniel

2012-11-21

We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
Foraging on the potential energy surface: A swarm intelligence-based optimizer for molecular geometry

NASA Astrophysics Data System (ADS)

Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D.; Sebastiani, Daniel

2012-11-01

We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
Toward link predictability of complex networks

PubMed Central

Lü, Linyuan; Pan, Liming; Zhou, Tao; Zhang, Yi-Cheng; Stanley, H. Eugene

2015-01-01

The organization of real networks usually embodies both regularities and irregularities, and, in principle, the former can be modeled. The extent to which the formation of a network can be explained coincides with our ability to predict missing links. To understand network organization, we should be able to estimate link predictability. We assume that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. Based on the perturbation of the adjacency matrix, we propose a universal structural consistency index that is free of prior knowledge of network organization. Extensive experiments on disparate real-world networks demonstrate that (i) structural consistency is a good estimation of link predictability and (ii) a derivative algorithm outperforms state-of-the-art link prediction methods in both accuracy and robustness. This analysis has further applications in evaluating link prediction algorithms and monitoring sudden changes in evolving network mechanisms. It will provide unique fundamental insights into the above-mentioned academic research fields, and will foster the development of advanced information filtering technologies of interest to information technology practitioners. PMID:25659742
A unified algorithm for predicting partition coefficients for PBPK modeling of drugs and environmental chemicals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peyret, Thomas; Poulin, Patrick; Krishnan, Kannan, E-mail: kannan.krishnan@umontreal.ca

The algorithms in the literature focusing to predict tissue:blood PC (P{sub tb}) for environmental chemicals and tissue:plasma PC based on total (K{sub p}) or unbound concentration (K{sub pu}) for drugs differ in their consideration of binding to hemoglobin, plasma proteins and charged phospholipids. The objective of the present study was to develop a unified algorithm such that P{sub tb}, K{sub p} and K{sub pu} for both drugs and environmental chemicals could be predicted. The development of the unified algorithm was accomplished by integrating all mechanistic algorithms previously published to compute the PCs. Furthermore, the algorithm was structured in such amore » way as to facilitate predictions of the distribution of organic compounds at the macro (i.e. whole tissue) and micro (i.e. cells and fluids) levels. The resulting unified algorithm was applied to compute the rat P{sub tb}, K{sub p} or K{sub pu} of muscle (n = 174), liver (n = 139) and adipose tissue (n = 141) for acidic, neutral, zwitterionic and basic drugs as well as ketones, acetate esters, alcohols, aliphatic hydrocarbons, aromatic hydrocarbons and ethers. The unified algorithm reproduced adequately the values predicted previously by the published algorithms for a total of 142 drugs and chemicals. The sensitivity analysis demonstrated the relative importance of the various compound properties reflective of specific mechanistic determinants relevant to prediction of PC values of drugs and environmental chemicals. Overall, the present unified algorithm uniquely facilitates the computation of macro and micro level PCs for developing organ and cellular-level PBPK models for both chemicals and drugs.« less
Protein docking prediction using predicted protein-protein interface.

PubMed

Li, Bin; Kihara, Daisuke

2012-01-10

Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

PubMed

Kaur, Harpreet; Raghava, G P S

2002-03-01

beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
Improved Diagnostic Validity of the ADOS Revised Algorithms: A Replication Study in an Independent Sample

ERIC Educational Resources Information Center

Oosterling, Iris; Roos, Sascha; de Bildt, Annelies; Rommelse, Nanda; de Jonge, Maretha; Visser, Janne; Lappenschaar, Martijn; Swinkels, Sophie; van der Gaag, Rutger Jan; Buitelaar, Jan

2010-01-01

Recently, Gotham et al. ("2007") proposed revised algorithms for the Autism Diagnostic Observation Schedule (ADOS) with improved diagnostic validity. The aim of the current study was to replicate predictive validity, factor structure, and correlations with age and verbal and nonverbal IQ of the ADOS revised algorithms for Modules 1 and 2…
The SIST-M: Predictive validity of a brief structured Clinical Dementia Rating interview

PubMed Central

Okereke, Olivia I.; Pantoja-Galicia, Norberto; Copeland, Maura; Hyman, Bradley T.; Wanggaard, Taylor; Albert, Marilyn S.; Betensky, Rebecca A.; Blacker, Deborah

2011-01-01

Background We previously established reliability and cross-sectional validity of the SIST-M (Structured Interview and Scoring Tool–Massachusetts Alzheimer's Disease Research Center), a shortened version of an instrument shown to predict progression to Alzheimer disease (AD), even among persons with very mild cognitive impairment (vMCI). Objective To test predictive validity of the SIST-M. Methods Participants were 342 community-dwelling, non-demented older adults in a longitudinal study. Baseline Clinical Dementia Rating (CDR) ratings were determined by either: 1) clinician interviews or 2) a previously developed computer algorithm based on 60 questions (of a possible 131) extracted from clinician interviews. We developed age+gender+education-adjusted Cox proportional hazards models using CDR-sum-of-boxes (CDR-SB) as the predictor, where CDR-SB was determined by either clinician interview or algorithm; models were run for the full sample (n=342) and among those jointly classified as vMCI using clinician- and algorithm-based CDR ratings (n=156). We directly compared predictive accuracy using time-dependent Receiver Operating Characteristic (ROC) curves. Results AD hazard ratios (HRs) were similar for clinician-based and algorithm-based CDR-SB: for a 1-point increment in CDR-SB, respective HRs (95% CI)=3.1 (2.5,3.9) and 2.8 (2.2,3.5); among those with vMCI, respective HRs (95% CI) were 2.2 (1.6,3.2) and 2.1 (1.5,3.0). Similarly high predictive accuracy was achieved: the concordance probability (weighted average of the area-under-the-ROC curves) over follow-up was 0.78 vs. 0.76 using clinician-based vs. algorithm-based CDR-SB. Conclusion CDR scores based on items from this shortened interview had high predictive ability for AD – comparable to that using a lengthy clinical interview. PMID:21986342
Automated and fast building of three-dimensional RNA structures.

PubMed

Zhao, Yunjie; Huang, Yangyu; Gong, Zhou; Wang, Yanjie; Man, Jianfen; Xiao, Yi

2012-01-01

Building tertiary structures of non-coding RNA is required to understand their functions and design new molecules. Current algorithms of RNA tertiary structure prediction give satisfactory accuracy only for small size and simple topology and many of them need manual manipulation. Here, we present an automated and fast program, 3dRNA, for RNA tertiary structure prediction with reasonable accuracy for RNAs of larger size and complex topology.
Semiempirical prediction of protein folds

NASA Astrophysics Data System (ADS)

Fernández, Ariel; Colubri, Andrés; Appignanesi, Gustavo

2001-08-01

We introduce a semiempirical approach to predict ab initio expeditious pathways and native backbone geometries of proteins that fold under in vitro renaturation conditions. The algorithm is engineered to incorporate a discrete codification of local steric hindrances that constrain the movements of the peptide backbone throughout the folding process. Thus, the torsional state of the chain is assumed to be conditioned by the fact that hopping from one basin of attraction to another in the Ramachandran map (local potential energy surface) of each residue is energetically more costly than the search for a specific (Φ, Ψ) torsional state within a single basin. A combinatorial procedure is introduced to evaluate coarsely defined torsional states of the chain defined ``modulo basins'' and translate them into meaningful patterns of long range interactions. Thus, an algorithm for structure prediction is designed based on the fact that local contributions to the potential energy may be subsumed into time-evolving conformational constraints defining sets of restricted backbone geometries whereupon the patterns of nonbonded interactions are constructed. The predictive power of the algorithm is assessed by (a) computing ab initio folding pathways for mammalian ubiquitin that ultimately yield a stable structural pattern reproducing all of its native features, (b) determining the nucleating event that triggers the hydrophobic collapse of the chain, and (c) comparing coarse predictions of the stable folds of moderately large proteins (N~100) with structural information extracted from the protein data bank.
Cascade generalized predictive control strategy for boiler drum level.

PubMed

Xu, Min; Li, Shaoyuan; Cai, Wenjian

2005-07-01

This paper proposes a cascade model predictive control scheme for boiler drum level control. By employing generalized predictive control structures for both inner and outer loops, measured and unmeasured disturbances can be effectively rejected, and drum level at constant load is maintained. In addition, nonminimum phase characteristic and system constraints in both loops can be handled effectively by generalized predictive control algorithms. Simulation results are provided to show that cascade generalized predictive control results in better performance than that of well tuned cascade proportional integral differential controllers. The algorithm has also been implemented to control a 75-MW boiler plant, and the results show an improvement over conventional control schemes.
FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

PubMed Central

2011-01-01

Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582
Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

PubMed

Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

2009-11-01

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

PubMed

Fan, Ming; Zheng, Bin; Li, Lihua

2015-10-01

Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Dinucleotide controlled null models for comparative RNA gene prediction.

PubMed

Gesell, Tanja; Washietl, Stefan

2008-05-27

Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.
Knowledge-based prediction of protein backbone conformation using a structural alphabet.

PubMed

Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard

2017-01-01

Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences

PubMed Central

Chuang, Gwo-Yu; Liou, David; Kwong, Peter D.; Georgiev, Ivelin S.

2014-01-01

Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. PMID:24782517
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

PubMed

Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

2017-06-06

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Novel Near-Lossless Compression Algorithm for Medical Sequence Images with Adaptive Block-Based Spatial Prediction.

PubMed

Song, Xiaoying; Huang, Qijun; Chang, Sheng; He, Jin; Wang, Hao

2016-12-01

To address the low compression efficiency of lossless compression and the low image quality of general near-lossless compression, a novel near-lossless compression algorithm based on adaptive spatial prediction is proposed for medical sequence images for possible diagnostic use in this paper. The proposed method employs adaptive block size-based spatial prediction to predict blocks directly in the spatial domain and Lossless Hadamard Transform before quantization to improve the quality of reconstructed images. The block-based prediction breaks the pixel neighborhood constraint and takes full advantage of the local spatial correlations found in medical images. The adaptive block size guarantees a more rational division of images and the improved use of the local structure. The results indicate that the proposed algorithm can efficiently compress medical images and produces a better peak signal-to-noise ratio (PSNR) under the same pre-defined distortion than other near-lossless methods.

Knotty: Efficient and Accurate Prediction of Complex RNA Pseudoknot Structures.

PubMed

Jabbari, Hosna; Wark, Ian; Montemagno, Carlo; Will, Sebastian

2018-06-01

The computational prediction of RNA secondary structure by free energy minimization has become an important tool in RNA research. However in practice, energy minimization is mostly limited to pseudoknot-free structures or rather simple pseudoknots, not covering many biologically important structures such as kissing hairpins. Algorithms capable of predicting sufficiently complex pseudoknots (for sequences of length n) used to have extreme complexities, e.g. Pknots (Rivas and Eddy, 1999) has O(n6) time and O(n4) space complexity. The algorithm CCJ (Chen et al., 2009) dramatically improves the asymptotic run time for predicting complex pseudoknots (handling almost all relevant pseudoknots, while being slightly less general than Pknots), but this came at the cost of large constant factors in space and time, which strongly limited its practical application (∼200 bases already require 256GB space). We present a CCJ-type algorithm, Knotty, that handles the same comprehensive pseudoknot class of structures as CCJ with improved space complexity of Θ(n3 + Z)-due to the applied technique of sparsification, the number of "candidates", Z, appears to grow significantly slower than n4 on our benchmark set (which include pseudoknotted RNAs up to 400 nucleotides). In terms of run time over this benchmark, Knotty clearly outperforms Pknots and the original CCJ implementation, CCJ 1.0; Knotty's space consumption fundamentally improves over CCJ 1.0, being on a par with the space-economic Pknots. By comparing to CCJ 2.0, our unsparsified Knotty variant, we demonstrate the isolated effect of sparsification. Moreover, Knotty employs the state-of-the-art energy model of "HotKnots DP09", which results in superior prediction accuracy over Pknots. Our software is available at https://github.com/HosnaJabbari/Knotty. will@tbi.unvie.ac.at. Supplementary data are available at Bioinformatics online.
Structural alignment of protein descriptors - a combinatorial model.

PubMed

Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek

2016-09-17

Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).
Parallel protein secondary structure prediction based on neural networks.

PubMed

Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

2004-01-01

Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
Versatile and efficient pore network extraction method using marker-based watershed segmentation

NASA Astrophysics Data System (ADS)

Gostick, Jeff T.

2017-08-01

Obtaining structural information from tomographic images of porous materials is a critical component of porous media research. Extracting pore networks is particularly valuable since it enables pore network modeling simulations which can be useful for a host of tasks from predicting transport properties to simulating performance of entire devices. This work reports an efficient algorithm for extracting networks using only standard image analysis techniques. The algorithm was applied to several standard porous materials ranging from sandstone to fibrous mats, and in all cases agreed very well with established or known values for pore and throat sizes, capillary pressure curves, and permeability. In the case of sandstone, the present algorithm was compared to the network obtained using the current state-of-the-art algorithm, and very good agreement was achieved. Most importantly, the network extracted from an image of fibrous media correctly predicted the anisotropic permeability tensor, demonstrating the critical ability to detect key structural features. The highly efficient algorithm allows extraction on fairly large images of 5003 voxels in just over 200 s. The ability for one algorithm to match materials as varied as sandstone with 20% porosity and fibrous media with 75% porosity is a significant advancement. The source code for this algorithm is provided.
ProSelection: A Novel Algorithm to Select Proper Protein Structure Subsets for in Silico Target Identification and Drug Discovery Research.

PubMed

Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun

2017-11-27

Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
Automated Phase Segmentation for Large-Scale X-ray Diffraction Data Using a Graph-Based Phase Segmentation (GPhase) Algorithm.

PubMed

Xiong, Zheng; He, Yinyan; Hattrick-Simpers, Jason R; Hu, Jianjun

2017-03-13

The creation of composition-processing-structure relationships currently represents a key bottleneck for data analysis for high-throughput experimental (HTE) material studies. Here we propose an automated phase diagram attribution algorithm for HTE data analysis that uses a graph-based segmentation algorithm and Delaunay tessellation to create a crystal phase diagram from high throughput libraries of X-ray diffraction (XRD) patterns. We also propose the sample-pair based objective evaluation measures for the phase diagram prediction problem. Our approach was validated using 278 diffraction patterns from a Fe-Ga-Pd composition spread sample with a prediction precision of 0.934 and a Matthews Correlation Coefficient score of 0.823. The algorithm was then applied to the open Ni-Mn-Al thin-film composition spread sample to obtain the first predicted phase diagram mapping for that sample.
Performance comparison of the Prophecy (forecasting) Algorithm in FFT form for unseen feature and time-series prediction

NASA Astrophysics Data System (ADS)

Jaenisch, Holger; Handley, James

2013-06-01

We introduce a generalized numerical prediction and forecasting algorithm. We have previously published it for malware byte sequence feature prediction and generalized distribution modeling for disparate test article analysis. We show how non-trivial non-periodic extrapolation of a numerical sequence (forecast and backcast) from the starting data is possible. Our ancestor-progeny prediction can yield new options for evolutionary programming. Our equations enable analytical integrals and derivatives to any order. Interpolation is controllable from smooth continuous to fractal structure estimation. We show how our generalized trigonometric polynomial can be derived using a Fourier transform.
Improved Model for Predicting the Free Energy Contribution of Dinucleotide Bulges to RNA Duplex Stability.

PubMed

Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M

2015-09-01

Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space.

PubMed

Chen, Xiang; He, Si-Min; Bu, Dongbo; Zhang, Fa; Wang, Zhiyong; Chen, Runsheng; Gao, Wen

2008-09-15

RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is proved to be NP-hard. Due to kinetic reasons the real RNA secondary structure often has local instead of global minimum free energy. This implies that we may improve the performance of RNA secondary structure prediction by taking kinetics into account and minimize free energy in a local area. we propose a novel algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Still based on MFE criterion, FlexStem adopts comprehensive energy models that allow complex pseudoknots. Unlike classical thermodynamic methods, our approach aims to simulate the RNA folding process by successive addition of maximal stems, reducing the search space while maintaining or even improving the prediction accuracy. This reduced space is constructed by our maximal stem strategy and stem-adding rule induced from elaborate statistical experiments on real RNA secondary structures. The strategy and the rule also reflect the folding characteristic of RNA from a new angle and help compensate for the deficiency of merely relying on MFE in RNA structure prediction. We validate FlexStem by applying it to tRNAs, 5SrRNAs and a large number of pseudoknotted structures and compare it with the well-known algorithms such as RNAfold, PKNOTS, PknotsRG, HotKnots and ILM according to their overall sensitivities and specificities, as well as positive and negative controls on pseudoknots. The results show that FlexStem significantly increases the prediction accuracy through its local search strategy. Software is available at http://pfind.ict.ac.cn/FlexStem/. Supplementary data are available at Bioinformatics online.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

NASA Astrophysics Data System (ADS)

Wang, Leilei; Cheng, Jinyong

2018-03-01

Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

PubMed Central

Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

2006-01-01

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
Kalman/Map filtering-aided fast normalized cross correlation-based Wi-Fi fingerprinting location sensing.

PubMed

Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

2013-11-13

A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.
Kalman/Map Filtering-Aided Fast Normalized Cross Correlation-Based Wi-Fi Fingerprinting Location Sensing

PubMed Central

Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

2013-01-01

A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027
Identify High-Quality Protein Structural Models by Enhanced K-Means.

PubMed

Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

2017-01-01

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means

PubMed Central

Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang

2017-01-01

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
Comparative Analysis of Soft Computing Models in Prediction of Bending Rigidity of Cotton Woven Fabrics

NASA Astrophysics Data System (ADS)

Guruprasad, R.; Behera, B. K.

2015-10-01

Quantitative prediction of fabric mechanical properties is an essential requirement for design engineering of textile and apparel products. In this work, the possibility of prediction of bending rigidity of cotton woven fabrics has been explored with the application of Artificial Neural Network (ANN) and two hybrid methodologies, namely Neuro-genetic modeling and Adaptive Neuro-Fuzzy Inference System (ANFIS) modeling. For this purpose, a set of cotton woven grey fabrics was desized, scoured and relaxed. The fabrics were then conditioned and tested for bending properties. With the database thus created, a neural network model was first developed using back propagation as the learning algorithm. The second model was developed by applying a hybrid learning strategy, in which genetic algorithm was first used as a learning algorithm to optimize the number of neurons and connection weights of the neural network. The Genetic algorithm optimized network structure was further allowed to learn using back propagation algorithm. In the third model, an ANFIS modeling approach was attempted to map the input-output data. The prediction performances of the models were compared and a sensitivity analysis was reported. The results show that the prediction by neuro-genetic and ANFIS models were better in comparison with that of back propagation neural network model.
System identification of an unmanned quadcopter system using MRAN neural

NASA Astrophysics Data System (ADS)

Pairan, M. F.; Shamsudin, S. S.

2017-12-01

This project presents the performance analysis of the radial basis function neural network (RBF) trained with Minimal Resource Allocating Network (MRAN) algorithm for real-time identification of quadcopter. MRAN’s performance is compared with the RBF with Constant Trace algorithm for 2500 input-output pair data sampling. MRAN utilizes adding and pruning hidden neuron strategy to obtain optimum RBF structure, increase prediction accuracy and reduce training time. The results indicate that MRAN algorithm produces fast training time and more accurate prediction compared with standard RBF. The model proposed in this paper is capable of identifying and modelling a nonlinear representation of the quadcopter flight dynamics.
Artificial Intelligence in Prediction of Secondary Protein Structure Using CB513 Database

PubMed Central

Avdagic, Zikrija; Purisevic, Elvir; Omanovic, Samir; Coralic, Zlatan

2009-01-01

In this paper we describe CB513 a non-redundant dataset, suitable for development of algorithms for prediction of secondary protein structure. A program was made in Borland Delphi for transforming data from our dataset to make it suitable for learning of neural network for prediction of secondary protein structure implemented in MATLAB Neural-Network Toolbox. Learning (training and testing) of neural network is researched with different sizes of windows, different number of neurons in the hidden layer and different number of training epochs, while using dataset CB513. PMID:21347158
RNAstructure: software for RNA secondary structure prediction and analysis.

PubMed

Reuter, Jessica S; Mathews, David H

2010-03-15

To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.
Spatial Data Structures for Robotic Vehicle Route Planning

DTIC Science & Technology

1988-12-01

goal will be realized in an intelligent Spatial Data Structure Development System (SDSDS) intended for use by Terrain Analysis applications...from the user the details of representation and to permit the infrastructure itself to decide which representations will be most efficient or effective ...to intelligently predict performance of algorithmic sequences and thereby optimize the application (within the accuracy of the prediction models). The

Decision tree methods: applications for classification and prediction.

PubMed

Song, Yan-Yan; Lu, Ying

2015-04-25

Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.

PubMed

Pellegrini, Marco

2015-01-01

Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
A multi-populations multi-strategies differential evolution algorithm for structural optimization of metal nanoclusters

NASA Astrophysics Data System (ADS)

Fan, Tian-E.; Shao, Gui-Fang; Ji, Qing-Shuang; Zheng, Ji-Wen; Liu, Tun-dong; Wen, Yu-Hua

2016-11-01

Theoretically, the determination of the structure of a cluster is to search the global minimum on its potential energy surface. The global minimization problem is often nondeterministic-polynomial-time (NP) hard and the number of local minima grows exponentially with the cluster size. In this article, a multi-populations multi-strategies differential evolution algorithm has been proposed to search the globally stable structure of Fe and Cr nanoclusters. The algorithm combines a multi-populations differential evolution with an elite pool scheme to keep the diversity of the solutions and avoid prematurely trapping into local optima. Moreover, multi-strategies such as growing method in initialization and three differential strategies in mutation are introduced to improve the convergence speed and lower the computational cost. The accuracy and effectiveness of our algorithm have been verified by comparing the results of Fe clusters with Cambridge Cluster Database. Meanwhile, the performance of our algorithm has been analyzed by comparing the convergence rate and energy evaluations with the classical DE algorithm. The multi-populations, multi-strategies mutation and growing method in initialization in our algorithm have been considered respectively. Furthermore, the structural growth pattern of Cr clusters has been predicted by this algorithm. The results show that the lowest-energy structure of Cr clusters contains many icosahedra, and the number of the icosahedral rings rises with increasing size.
XTALOPT version r11: An open-source evolutionary algorithm for crystal structure prediction

NASA Astrophysics Data System (ADS)

Avery, Patrick; Falls, Zackary; Zurek, Eva

2018-01-01

Version 11 of XTALOPT, an evolutionary algorithm for crystal structure prediction, has now been made available for download from the CPC library or the XTALOPT website, http://xtalopt.github.io. Whereas the previous versions of XTALOPT were published under the Gnu Public License (GPL), the current version is made available under the 3-Clause BSD License, which is an open source license that is recognized by the Open Source Initiative. Importantly, the new version can be executed via a command line interface (i.e., it does not require the use of a Graphical User Interface). Moreover, the new version is written as a stand-alone program, rather than an extension to AVOGADRO.
Topology of membrane proteins-predictions, limitations and variations.

PubMed

Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

2017-10-26

Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Real-time prediction and gating of respiratory motion using an extended Kalman filter and Gaussian process regression

NASA Astrophysics Data System (ADS)

Bukhari, W.; Hong, S.-M.

2015-01-01

Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR+, implements a gating function without pre-specifying a particular region of the patient’s breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR+ algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR+ implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR+ in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR+. The experimental results show that the EKF-GPR+ algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR+ reduces the patient-wise RMS error to 37%, 39% and 42% in percent ratios relative to no prediction for a duty cycle of 80% at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The experiments also confirm that EKF-GPR+ controls the duty cycle with reasonable accuracy.
Prediction of pKa Values for Neutral and Basic Drugs based on Hybrid Artificial Intelligence Methods.

PubMed

Li, Mengshan; Zhang, Huaijing; Chen, Bingsheng; Wu, Yan; Guan, Lixin

2018-03-05

The pKa value of drugs is an important parameter in drug design and pharmacology. In this paper, an improved particle swarm optimization (PSO) algorithm was proposed based on the population entropy diversity. In the improved algorithm, when the population entropy was higher than the set maximum threshold, the convergence strategy was adopted; when the population entropy was lower than the set minimum threshold the divergence strategy was adopted; when the population entropy was between the maximum and minimum threshold, the self-adaptive adjustment strategy was maintained. The improved PSO algorithm was applied in the training of radial basis function artificial neural network (RBF ANN) model and the selection of molecular descriptors. A quantitative structure-activity relationship model based on RBF ANN trained by the improved PSO algorithm was proposed to predict the pKa values of 74 kinds of neutral and basic drugs and then validated by another database containing 20 molecules. The validation results showed that the model had a good prediction performance. The absolute average relative error, root mean square error, and squared correlation coefficient were 0.3105, 0.0411, and 0.9685, respectively. The model can be used as a reference for exploring other quantitative structure-activity relationships.
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

PubMed

Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

2017-01-01

Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
A Grammatical Approach to RNA-RNA Interaction Prediction

NASA Astrophysics Data System (ADS)

Kato, Yuki; Akutsu, Tatsuya; Seki, Hiroyuki

2007-11-01

Much attention has been paid to two interacting RNA molecules involved in post-transcriptional control of gene expression. Although there have been a few studies on RNA-RNA interaction prediction based on dynamic programming algorithm, no grammar-based approach has been proposed. The purpose of this paper is to provide a new modeling for RNA-RNA interaction based on multiple context-free grammar (MCFG). We present a polynomial time parsing algorithm for finding the most likely derivation tree for the stochastic version of MCFG, which is applicable to RNA joint secondary structure prediction including kissing hairpin loops. Also, elementary tests on RNA-RNA interaction prediction have shown that the proposed method is comparable to Alkan et al.'s method.
Predicting termination of atrial fibrillation based on the structure and quantification of the recurrence plot.

PubMed

Sun, Rongrong; Wang, Yuanyuan

2008-11-01

Predicting the spontaneous termination of the atrial fibrillation (AF) leads to not only better understanding of mechanisms of the arrhythmia but also the improved treatment of the sustained AF. A novel method is proposed to characterize the AF based on structure and the quantification of the recurrence plot (RP) to predict the termination of the AF. The RP of the electrocardiogram (ECG) signal is firstly obtained and eleven features are extracted to characterize its three basic patterns. Then the sequential forward search (SFS) algorithm and Davies-Bouldin criterion are utilized to select the feature subset which can predict the AF termination effectively. Finally, the multilayer perceptron (MLP) neural network is applied to predict the AF termination. An AF database which includes one training set and two testing sets (A and B) of Holter ECG recordings is studied. Experiment results show that 97% of testing set A and 95% of testing set B are correctly classified. It demonstrates that this algorithm has the ability to predict the spontaneous termination of the AF effectively.
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

PubMed

Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

2007-03-16

Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.
MicroRNAfold: pre-microRNA secondary structure prediction based on modified NCM model with thermodynamics-based scoring strategy.

PubMed

Han, Dianwei; Zhang, Jun; Tang, Guiliang

2012-01-01

An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. Our experimental results show that microRNAfold outperforms the current leading prediction tools in terms of True Negative rate, False Negative rate, Specificity, and Matthews coefficient ratio.
Robust Kalman filter design for predictive wind shear detection

NASA Technical Reports Server (NTRS)

Stratton, Alexander D.; Stengel, Robert F.

1991-01-01

Severe, low-altitude wind shear is a threat to aviation safety. Airborne sensors under development measure the radial component of wind along a line directly in front of an aircraft. In this paper, optimal estimation theory is used to define a detection algorithm to warn of hazardous wind shear from these sensors. To achieve robustness, a wind shear detection algorithm must distinguish threatening wind shear from less hazardous gustiness, despite variations in wind shear structure. This paper presents statistical analysis methods to refine wind shear detection algorithm robustness. Computational methods predict the ability to warn of severe wind shear and avoid false warning. Comparative capability of the detection algorithm as a function of its design parameters is determined, identifying designs that provide robust detection of severe wind shear.
Frnakenstein: multiple target inverse RNA folding.

PubMed

Lyngsø, Rune B; Anderson, James W J; Sizikova, Elena; Badugu, Amarendra; Hyland, Tomas; Hein, Jotun

2012-10-09

RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Frnakenstein: multiple target inverse RNA folding

PubMed Central

2012-01-01

Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein. PMID:23043260
Switching algorithm for maglev train double-modular redundant positioning sensors.

PubMed

He, Ning; Long, Zhiqiang; Xue, Song

2012-01-01

High-resolution positioning for maglev trains is implemented by detecting the tooth-slot structure of the long stator installed along the rail, but there are large joint gaps between long stator sections. When a positioning sensor is below a large joint gap, its positioning signal is invalidated, thus double-modular redundant positioning sensors are introduced into the system. This paper studies switching algorithms for these redundant positioning sensors. At first, adaptive prediction is applied to the sensor signals. The prediction errors are used to trigger sensor switching. In order to enhance the reliability of the switching algorithm, wavelet analysis is introduced to suppress measuring disturbances without weakening the signal characteristics reflecting the stator joint gap based on the correlation between the wavelet coefficients of adjacent scales. The time delay characteristics of the method are analyzed to guide the algorithm simplification. Finally, the effectiveness of the simplified switching algorithm is verified through experiments.
Switching Algorithm for Maglev Train Double-Modular Redundant Positioning Sensors

PubMed Central

He, Ning; Long, Zhiqiang; Xue, Song

2012-01-01

High-resolution positioning for maglev trains is implemented by detecting the tooth-slot structure of the long stator installed along the rail, but there are large joint gaps between long stator sections. When a positioning sensor is below a large joint gap, its positioning signal is invalidated, thus double-modular redundant positioning sensors are introduced into the system. This paper studies switching algorithms for these redundant positioning sensors. At first, adaptive prediction is applied to the sensor signals. The prediction errors are used to trigger sensor switching. In order to enhance the reliability of the switching algorithm, wavelet analysis is introduced to suppress measuring disturbances without weakening the signal characteristics reflecting the stator joint gap based on the correlation between the wavelet coefficients of adjacent scales. The time delay characteristics of the method are analyzed to guide the algorithm simplification. Finally, the effectiveness of the simplified switching algorithm is verified through experiments. PMID:23112657
Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

PubMed

Hu, Chen; Steingrimsson, Jon Arni

2018-01-01

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
XTALOPT: An open-source evolutionary algorithm for crystal structure prediction

NASA Astrophysics Data System (ADS)

Lonie, David C.; Zurek, Eva

2011-02-01

The implementation and testing of XTALOPT, an evolutionary algorithm for crystal structure prediction, is outlined. We present our new periodic displacement (ripple) operator which is ideally suited to extended systems. It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search. This allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions. A continuous workflow, which makes better use of computational resources as compared to traditional generation based algorithms, is employed. Various parameters in XTALOPT are optimized using a novel benchmarking scheme. XTALOPT is available under the GNU Public License, has been interfaced with various codes commonly used to study extended systems, and has an easy to use, intuitive graphical interface. Program summaryProgram title:XTALOPT Catalogue identifier: AEGX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v2.1 or later [1] No. of lines in distributed program, including test data, etc.: 36 849 No. of bytes in distributed program, including test data, etc.: 1 149 399 Distribution format: tar.gz Programming language: C++ Computer: PCs, workstations, or clusters Operating system: Linux Classification: 7.7 External routines: QT [2], OpenBabel [3], AVOGADRO [4], SPGLIB [8] and one of: VASP [5], PWSCF [6], GULP [7]. Nature of problem: Predicting the crystal structure of a system from its stoichiometry alone remains a grand challenge in computational materials science, chemistry, and physics. Solution method: Evolutionary algorithms are stochastic search techniques which use concepts from biological evolution in order to locate the global minimum on their potential energy surface. Our evolutionary algorithm, XTALOPT, is freely available to the scientific community for use and collaboration under the GNU Public License. Running time: User dependent. The program runs until stopped by the user.

Design and experiment of vehicular charger AC/DC system based on predictive control algorithm

NASA Astrophysics Data System (ADS)

He, Guangbi; Quan, Shuhai; Lu, Yuzhang

2018-06-01

For the car charging stage rectifier uncontrollable system, this paper proposes a predictive control algorithm of DC/DC converter based on the prediction model, established by the state space average method and its prediction model, obtained by the optimal mathematical description of mathematical calculation, to analysis prediction algorithm by Simulink simulation. The design of the structure of the car charging, at the request of the rated output power and output voltage adjustable control circuit, the first stage is the three-phase uncontrolled rectifier DC voltage Ud through the filter capacitor, after by using double-phase interleaved buck-boost circuit with wide range output voltage required value, analyzing its working principle and the the parameters for the design and selection of components. The analysis of current ripple shows that the double staggered parallel connection has the advantages of reducing the output current ripple and reducing the loss. The simulation experiment of the whole charging circuit is carried out by software, and the result is in line with the design requirements of the system. Finally combining the soft with hardware circuit to achieve charging of the system according to the requirements, experimental platform proved the feasibility and effectiveness of the proposed predictive control algorithm based on the car charging of the system, which is consistent with the simulation results.
A Network Selection Algorithm Considering Power Consumption in Hybrid Wireless Networks

NASA Astrophysics Data System (ADS)

Joe, Inwhee; Kim, Won-Tae; Hong, Seokjoon

In this paper, we propose a novel network selection algorithm considering power consumption in hybrid wireless networks for vertical handover. CDMA, WiBro, WLAN networks are candidate networks for this selection algorithm. This algorithm is composed of the power consumption prediction algorithm and the final network selection algorithm. The power consumption prediction algorithm estimates the expected lifetime of the mobile station based on the current battery level, traffic class and power consumption for each network interface card of the mobile station. If the expected lifetime of the mobile station in a certain network is not long enough compared the handover delay, this particular network will be removed from the candidate network list, thereby preventing unnecessary handovers in the preprocessing procedure. On the other hand, the final network selection algorithm consists of AHP (Analytic Hierarchical Process) and GRA (Grey Relational Analysis). The global factors of the network selection structure are QoS, cost and lifetime. If user preference is lifetime, our selection algorithm selects the network that offers longest service duration due to low power consumption. Also, we conduct some simulations using the OPNET simulation tool. The simulation results show that the proposed algorithm provides longer lifetime in the hybrid wireless network environment.
fRMSDPred: Predicting Local RMSD Between Structural Fragments Using Sequence Information

DTIC Science & Technology

2007-04-04

machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel
New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

PubMed

Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta

2018-04-15

Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.
Ads' click-through rates predicting based on gated recurrent unit neural networks

NASA Astrophysics Data System (ADS)

Chen, Qiaohong; Guo, Zixuan; Dong, Wen; Jin, Lingzi

2018-05-01

In order to improve the effect of online advertising and to increase the revenue of advertising, the gated recurrent unit neural networks(GRU) model is used as the ads' click through rates(CTR) predicting. Combined with the characteristics of gated unit structure and the unique of time sequence in data, using BPTT algorithm to train the model. Furthermore, by optimizing the step length algorithm of the gated unit recurrent neural networks, making the model reach optimal point better and faster in less iterative rounds. The experiment results show that the model based on the gated recurrent unit neural networks and its optimization of step length algorithm has the better effect on the ads' CTR predicting, which helps advertisers, media and audience achieve a win-win and mutually beneficial situation in Three-Side Game.
Discriminative structural approaches for enzyme active-site prediction.

PubMed

Kato, Tsuyoshi; Nagano, Nozomi

2011-02-15

Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.
A dynamical system view of cerebellar function

NASA Astrophysics Data System (ADS)

Keeler, James D.

1990-06-01

First some previous theories of cerebellar function are reviewed, and deficiencies in how they map onto the neurophysiological structure are pointed out. I hypothesize that the cerebellar cortex builds an internal model, or prediction, of the dynamics of the animal. A class of algorithms for doing prediction based on local reconstruction of attractors are described, and it is shown how this class maps very well onto the structure of the cerebellar cortex. I hypothesize that the climbing fibers multiplex between different trajectories corresponding to different modes of operation. Then the vestibulo-ocular reflex is examined, and experiments to test the proposed model are suggested. The purpose of the presentation here is twofold: (1) To enlighten physiologists to the mathematics of a class of prediction algorithms that map well onto cerebellar architecture. (2) To enlighten dynamical system theorists to the physiological and anatomical details of the cerebellum.
A high-throughput exploration of magnetic materials by using structure predicting methods

NASA Astrophysics Data System (ADS)

Arapan, S.; Nieves, P.; Cuesta-López, S.

2018-02-01

We study the capability of a structure predicting method based on genetic/evolutionary algorithm for a high-throughput exploration of magnetic materials. We use the USPEX and VASP codes to predict stable and generate low-energy meta-stable structures for a set of representative magnetic structures comprising intermetallic alloys, oxides, interstitial compounds, and systems containing rare-earths elements, and for both types of ferromagnetic and antiferromagnetic ordering. We have modified the interface between USPEX and VASP codes to improve the performance of structural optimization as well as to perform calculations in a high-throughput manner. We show that exploring the structure phase space with a structure predicting technique reveals large sets of low-energy metastable structures, which not only improve currently exiting databases, but also may provide understanding and solutions to stabilize and synthesize magnetic materials suitable for permanent magnet applications.
NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

PubMed

Chuang, Gwo-Yu; Liou, David; Kwong, Peter D; Georgiev, Ivelin S

2014-07-01

Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
SU-G-JeP3-05: Geometry Based Transperineal Ultrasound Probe Positioning for Image Guided Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Camps, S; With, P de; Verhaegen, F

2016-06-15

Purpose: The use of ultrasound (US) imaging in radiotherapy is not widespread, primarily due to the need for skilled operators performing the scans. Automation of probe positioning has the potential to remove this need and minimize operator dependence. We introduce an algorithm for obtaining a US probe position that allows good anatomical structure visualization based on clinical requirements. The first application is on 4D transperineal US images of prostate cancer patients. Methods: The algorithm calculates the probe position and orientation using anatomical information provided by a reference CT scan, always available in radiotherapy workflows. As initial test, we apply themore » algorithm on a CIRS pelvic US phantom to obtain a set of possible probe positions. Subsequently, five of these positions are randomly chosen and used to acquire actual US volumes of the phantom. Visual inspection of these volumes reveal if the whole prostate, and adjacent edges of bladder and rectum are fully visualized, as clinically required. In addition, structure positions on the acquired US volumes are compared to predictions of the algorithm. Results: All acquired volumes fulfill the clinical requirements as specified in the previous section. Preliminary quantitative evaluation was performed on thirty consecutive slices of two volumes, on which the structures are easily recognizable. The mean absolute distances (MAD) between actual anatomical structure positions and positions predicted by the algorithm were calculated. This resulted in MAD of 2.4±0.4 mm for prostate, 3.2±0.9 mm for bladder and 3.3±1.3 mm for rectum. Conclusion: Visual inspection and quantitative evaluation show that the algorithm is able to propose probe positions that fulfill all clinical requirements. The obtained MAD is on average 2.9 mm. However, during evaluation we assumed no errors in structure segmentation and probe positioning. In future steps, accurate estimation of these errors will allow for better evaluation of the achieved accuracy.« less
High performance transcription factor-DNA docking with GPU computing

PubMed Central

2012-01-01

Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
ADME evaluation in drug discovery. 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs.

PubMed

Hou, Tingjun; Xu, Xiaojie

2002-12-01

In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.
Protein structure prediction with local adjust tabu search algorithm

PubMed Central

2014-01-01

Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
3dRPC: a web server for 3D RNA-protein structure prediction.

PubMed

Huang, Yangyu; Li, Haotian; Xiao, Yi

2018-04-01

RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.
Lossless medical image compression using geometry-adaptive partitioning and least square-based prediction.

PubMed

Song, Xiaoying; Huang, Qijun; Chang, Sheng; He, Jin; Wang, Hao

2018-06-01

To improve the compression rates for lossless compression of medical images, an efficient algorithm, based on irregular segmentation and region-based prediction, is proposed in this paper. Considering that the first step of a region-based compression algorithm is segmentation, this paper proposes a hybrid method by combining geometry-adaptive partitioning and quadtree partitioning to achieve adaptive irregular segmentation for medical images. Then, least square (LS)-based predictors are adaptively designed for each region (regular subblock or irregular subregion). The proposed adaptive algorithm not only exploits spatial correlation between pixels but it utilizes local structure similarity, resulting in efficient compression performance. Experimental results show that the average compression performance of the proposed algorithm is 10.48, 4.86, 3.58, and 0.10% better than that of JPEG 2000, CALIC, EDP, and JPEG-LS, respectively. Graphical abstract ᅟ.
Reliability-based design optimization of reinforced concrete structures including soil-structure interaction using a discrete gravitational search algorithm and a proposed metamodel

NASA Astrophysics Data System (ADS)

Khatibinia, M.; Salajegheh, E.; Salajegheh, J.; Fadaee, M. J.

2013-10-01

A new discrete gravitational search algorithm (DGSA) and a metamodelling framework are introduced for reliability-based design optimization (RBDO) of reinforced concrete structures. The RBDO of structures with soil-structure interaction (SSI) effects is investigated in accordance with performance-based design. The proposed DGSA is based on the standard gravitational search algorithm (GSA) to optimize the structural cost under deterministic and probabilistic constraints. The Monte-Carlo simulation (MCS) method is considered as the most reliable method for estimating the probabilities of reliability. In order to reduce the computational time of MCS, the proposed metamodelling framework is employed to predict the responses of the SSI system in the RBDO procedure. The metamodel consists of a weighted least squares support vector machine (WLS-SVM) and a wavelet kernel function, which is called WWLS-SVM. Numerical results demonstrate the efficiency and computational advantages of DGSA and the proposed metamodel for RBDO of reinforced concrete structures.
Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein.

PubMed

Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L; Hemmatzadeh, Farhid; Ebrahimie, Esmaeil

2014-01-01

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Understanding the Underlying Mechanism of HA-Subtyping in the Level of Physic-Chemical Characteristics of Protein

PubMed Central

Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L.

2014-01-01

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics. PMID:24809455
Validation of Coevolving Residue Algorithms via Pipeline Sensitivity Analysis: ELSC and OMES and ZNMI, Oh My!

PubMed Central

Brown, Christopher A.; Brown, Kevin S.

2010-01-01

Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions. PMID:20531955
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

PubMed

Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

2013-09-01

Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

Method of identifying hairpin DNA probes by partial fold analysis

DOEpatents

Miller, Benjamin L [Penfield, NY; Strohsahl, Christopher M [Saugerties, NY

2009-10-06

Method of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.
Method of identifying hairpin DNA probes by partial fold analysis

DOEpatents

Miller, Benjamin L.; Strohsahl, Christopher M.

2008-10-28

Methods of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.
An Algorithm for Protein Helix Assignment Using Helix Geometry

PubMed Central

Cao, Chen; Xu, Shutan; Wang, Lincong

2015-01-01

Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins. PMID:26132394
Corrosion Prediction with Parallel Finite Element Modeling for Coupled Hygro-Chemo Transport into Concrete under Chloride-Rich Environment

PubMed Central

Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping

2017-01-01

The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
A protein-dependent side-chain rotamer library.

PubMed

Bhuyan, Md Shariful Islam; Gao, Xin

2011-12-14

Protein side-chain packing problem has remained one of the key open problems in bioinformatics. The three main components of protein side-chain prediction methods are a rotamer library, an energy function and a search algorithm. Rotamer libraries summarize the existing knowledge of the experimentally determined structures quantitatively. Depending on how much contextual information is encoded, there are backbone-independent rotamer libraries and backbone-dependent rotamer libraries. Backbone-independent libraries only encode sequential information, whereas backbone-dependent libraries encode both sequential and locally structural information. However, side-chain conformations are determined by spatially local information, rather than sequentially local information. Since in the side-chain prediction problem, the backbone structure is given, spatially local information should ideally be encoded into the rotamer libraries. In this paper, we propose a new type of backbone-dependent rotamer library, which encodes structural information of all the spatially neighboring residues. We call it protein-dependent rotamer libraries. Given any rotamer library and a protein backbone structure, we first model the protein structure as a Markov random field. Then the marginal distributions are estimated by the inference algorithms, without doing global optimization or search. The rotamers from the given library are then re-ranked and associated with the updated probabilities. Experimental results demonstrate that the proposed protein-dependent libraries significantly outperform the widely used backbone-dependent libraries in terms of the side-chain prediction accuracy and the rotamer ranking ability. Furthermore, without global optimization/search, the side-chain prediction power of the protein-dependent library is still comparable to the global-search-based side-chain prediction methods.
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

PubMed Central

Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong

2012-01-01

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

PubMed Central

Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

2017-01-01

Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

PubMed

Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

2017-05-01

Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
A two-step nearest neighbors algorithm using satellite imagery for predicting forest structure within species composition classes

Treesearch

Ronald E. McRoberts

2009-01-01

Nearest neighbors techniques have been shown to be useful for predicting multiple forest attributes from forest inventory and Landsat satellite image data. However, in regions lacking good digital land cover information, nearest neighbors selected to predict continuous variables such as tree volume must be selected without regard to relevant categorical variables such...
Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern.

PubMed

Zhang, Tong-Liang; Ding, Yong-Sheng; Chou, Kuo-Chen

2008-01-07

Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.
Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

NASA Astrophysics Data System (ADS)

Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

2018-03-01

Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.
Blind prediction of noncanonical RNA structure at atomic accuracy.

PubMed

Watkins, Andrew M; Geniesse, Caleb; Kladwang, Wipapat; Zakrevsky, Paul; Jaeger, Luc; Das, Rhiju

2018-05-01

Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.
Recent developments of the NESSUS probabilistic structural analysis computer program

NASA Technical Reports Server (NTRS)

Millwater, H.; Wu, Y.-T.; Torng, T.; Thacker, B.; Riha, D.; Leung, C. P.

1992-01-01

The NESSUS probabilistic structural analysis computer program combines state-of-the-art probabilistic algorithms with general purpose structural analysis methods to compute the probabilistic response and the reliability of engineering structures. Uncertainty in loading, material properties, geometry, boundary conditions and initial conditions can be simulated. The structural analysis methods include nonlinear finite element and boundary element methods. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. The scope of the code has recently been expanded to include probabilistic life and fatigue prediction of structures in terms of component and system reliability and risk analysis of structures considering cost of failure. The code is currently being extended to structural reliability considering progressive crack propagation. Several examples are presented to demonstrate the new capabilities.
PREDICTION OF CHEMICAL REACTIVITY PARAMETERS AND PHYSICAL PROPERTIES OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE USING SPARC

EPA Science Inventory

The computer program SPARC (SPARC Performs Automated Reasoning in Chemistry) has been under development for several years to estimate physical properties and chemical reactivity parameters of organic compounds strictly from molecular structure. SPARC uses computational algorithms...
Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models

PubMed Central

Morton, Kenneth D.; Torrione, Peter A.; Throckmorton, Chandra S.; Collins, Leslie M.

2015-01-01

It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise. PMID:18706497
Predicting Positive and Negative Relationships in Large Social Networks.

PubMed

Wang, Guan-Nan; Gao, Hui; Chen, Lian; Mensah, Dennis N A; Fu, Yan

2015-01-01

In a social network, users hold and express positive and negative attitudes (e.g. support/opposition) towards other users. Those attitudes exhibit some kind of binary relationships among the users, which play an important role in social network analysis. However, some of those binary relationships are likely to be latent as the scale of social network increases. The essence of predicting latent binary relationships have recently began to draw researchers' attention. In this paper, we propose a machine learning algorithm for predicting positive and negative relationships in social networks inspired by structural balance theory and social status theory. More specifically, we show that when two users in the network have fewer common neighbors, the prediction accuracy of the relationship between them deteriorates. Accordingly, in the training phase, we propose a segment-based training framework to divide the training data into two subsets according to the number of common neighbors between users, and build a prediction model for each subset based on support vector machine (SVM). Moreover, to deal with large-scale social network data, we employ a sampling strategy that selects small amount of training data while maintaining high accuracy of prediction. We compare our algorithm with traditional algorithms and adaptive boosting of them. Experimental results of typical data sets show that our algorithm can deal with large social networks and consistently outperforms other methods.
R-chie: a web server and R package for visualizing RNA secondary structures

PubMed Central

Lai, Daniel; Proctor, Jeff R.; Zhu, Jing Yun A.; Meyer, Irmtraud M.

2012-01-01

Visually examining RNA structures can greatly aid in understanding their potential functional roles and in evaluating the performance of structure prediction algorithms. As many functional roles of RNA structures can already be studied given the secondary structure of the RNA, various methods have been devised for visualizing RNA secondary structures. Most of these methods depict a given RNA secondary structure as a planar graph consisting of base-paired stems interconnected by roundish loops. In this article, we present an alternative method of depicting RNA secondary structure as arc diagrams. This is well suited for structures that are difficult or impossible to represent as planar stem-loop diagrams. Arc diagrams can intuitively display pseudo-knotted structures, as well as transient and alternative structural features. In addition, they facilitate the comparison of known and predicted RNA secondary structures. An added benefit is that structure information can be displayed in conjunction with a corresponding multiple sequence alignments, thereby highlighting structure and primary sequence conservation and variation. We have implemented the visualization algorithm as a web server R-chie as well as a corresponding R package called R4RNA, which allows users to run the software locally and across a range of common operating systems. PMID:22434875
Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding.

PubMed

Wu, Johnny C; Gardner, David P; Ozer, Stuart; Gutell, Robin R; Ren, Pengyu

2009-08-28

The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.
Robust distributed model predictive control of linear systems with structured time-varying uncertainties

NASA Astrophysics Data System (ADS)

Zhang, Langwen; Xie, Wei; Wang, Jingcheng

2017-11-01

In this work, synthesis of robust distributed model predictive control (MPC) is presented for a class of linear systems subject to structured time-varying uncertainties. By decomposing a global system into smaller dimensional subsystems, a set of distributed MPC controllers, instead of a centralised controller, are designed. To ensure the robust stability of the closed-loop system with respect to model uncertainties, distributed state feedback laws are obtained by solving a min-max optimisation problem. The design of robust distributed MPC is then transformed into solving a minimisation optimisation problem with linear matrix inequality constraints. An iterative online algorithm with adjustable maximum iteration is proposed to coordinate the distributed controllers to achieve a global performance. The simulation results show the effectiveness of the proposed robust distributed MPC algorithm.
TMFoldWeb: a web server for predicting transmembrane protein fold class.

PubMed

Kozma, Dániel; Tusnády, Gábor E

2015-09-17

Here we present TMFoldWeb, the web server implementation of TMFoldRec, a transmembrane protein fold recognition algorithm. TMFoldRec uses statistical potentials and utilizes topology filtering and a gapless threading algorithm. It ranks template structures and selects the most likely candidates and estimates the reliability of the obtained lowest energy model. The statistical potential was developed in a maximum likelihood framework on a representative set of the PDBTM database. According to the benchmark test the performance of TMFoldRec is about 77 % in correctly predicting fold class for a given transmembrane protein sequence. An intuitive web interface has been developed for the recently published TMFoldRec algorithm. The query sequence goes through a pipeline of topology prediction and a systematic sequence to structure alignment (threading). Resulting templates are ordered by energy and reliability values and are colored according to their significance level. Besides the graphical interface, a programmatic access is available as well, via a direct interface for developers or for submitting genome-wide data sets. The TMFoldWeb web server is unique and currently the only web server that is able to predict the fold class of transmembrane proteins while assigning reliability scores for the prediction. This method is prepared for genome-wide analysis with its easy-to-use interface, informative result page and programmatic access. Considering the info-communication evolution in the last few years, the developed web server, as well as the molecule viewer, is responsive and fully compatible with the prevalent tablets and mobile devices.

Genetic algorithms for protein threading.

PubMed

Yadgari, J; Amir, A; Unger, R

1998-01-01

Despite many years of efforts, a direct prediction of protein structure from sequence is still not possible. As a result, in the last few years researchers have started to address the "inverse folding problem": Identifying and aligning a sequence to the fold with which it is most compatible, a process known as "threading". In two meetings in which protein folding predictions were objectively evaluated, it became clear that threading as a concept promises a real breakthrough, but that much improvement is still needed in the technique itself. Threading is a NP-hard problem, and thus no general polynomial solution can be expected. Still a practical approach with demonstrated ability to find optimal solutions in many cases, and acceptable solutions in other cases, is needed. We applied the technique of Genetic Algorithms in order to significantly improve the ability of threading algorithms to find the optimal alignment of a sequence to a structure, i.e. the alignment with the minimum free energy. A major progress reported here is the design of a representation of the threading alignment as a string of fixed length. With this representation validation of alignments and genetic operators are effectively implemented. Appropriate data structure and parameters have been selected. It is shown that Genetic Algorithm threading is effective and is able to find the optimal alignment in a few test cases. Furthermore, the described algorithm is shown to perform well even without pre-definition of core elements. Existing threading methods are dependent on such constraints to make their calculations feasible. But the concept of core elements is inherently arbitrary and should be avoided if possible. While a rigorous proof is hard to submit yet an, we present indications that indeed Genetic Algorithm threading is capable of finding consistently good solutions of full alignments in search spaces of size up to 10(70).
Predicting drug-target interactions by dual-network integrated logistic matrix factorization

NASA Astrophysics Data System (ADS)

Hao, Ming; Bryant, Stephen H.; Wang, Yanli

2017-01-01

In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.
Identification of Conserved Water Sites in Protein Structures for Drug Design.

PubMed

Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka

2017-12-26

Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Modeling Delamination in Postbuckled Composite Structures Under Static and Fatigue Loads

NASA Technical Reports Server (NTRS)

Bisagni, Chiara; Brambilla, Pietro; Bavila, Carlos G.

2013-01-01

The ability of the Abaqus progressive Virtual Crack Closure Technique (VCCT) to model delamination in composite structures was investigated for static, postbuckling, and fatigue loads. Preliminary evaluations were performed using simple Double Cantilever Beam (DCB) and Mixed-Mode Bending (MMB) specimens. The nodal release sequences that describe the propagation of the delamination front were investigated. The effect of using a sudden or a gradual nodal release was evaluated by considering meshes aligned with the crack front as well as misaligned meshes. Fatigue simulations were then performed using the Direct Cyclic Fatigue (DCF) algorithm. It was found that in specimens such as the DCB, which are characterized by a nearly linear response and a pure fracture mode, the algorithm correctly predicts the Paris Law rate of propagation. However, the Abaqus DCF algorithm does not consider different fatigue propagation laws in different fracture modes. Finally, skin/stiffener debonding was studied in an aircraft fuselage subcomponent in which debonding occurs deep into post-buckling deformation. VCCT was shown to be a robust tool for estimating the onset propagation. However, difficulties were found with the ability of the current implementation of the Abaqus progressive VCCT to predict delamination propagation within structures subjected to postbuckling deformations or fatigue loads.
Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA

PubMed Central

Xia, Fei; Dou, Yong; Zhou, Xingming; Yang, Xuejun; Xu, Jiaqing; Zhang, Yang

2009-01-01

Background In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design. Results RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%. Conclusion To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (ViennaPackage – 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU. PMID:19208138
Incorporating Psychological Predictors of Treatment Response into Health Economic Simulation Models: A Case Study in Type 1 Diabetes.

PubMed

Kruger, Jen; Pollard, Daniel; Basarir, Hasan; Thokala, Praveen; Cooke, Debbie; Clark, Marie; Bond, Rod; Heller, Simon; Brennan, Alan

2015-10-01

. Health economic modeling has paid limited attention to the effects that patients' psychological characteristics have on the effectiveness of treatments. This case study tests 1) the feasibility of incorporating psychological prediction models of treatment response within an economic model of type 1 diabetes, 2) the potential value of providing treatment to a subgroup of patients, and 3) the cost-effectiveness of providing treatment to a subgroup of responders defined using 5 different algorithms. . Multiple linear regressions were used to investigate relationships between patients' psychological characteristics and treatment effectiveness. Two psychological prediction models were integrated with a patient-level simulation model of type 1 diabetes. Expected value of individualized care analysis was undertaken. Five different algorithms were used to provide treatment to a subgroup of predicted responders. A cost-effectiveness analysis compared using the algorithms to providing treatment to all patients. . The psychological prediction models had low predictive power for treatment effectiveness. Expected value of individualized care results suggested that targeting education at responders could be of value. The cost-effectiveness analysis suggested, for all 5 algorithms, that providing structured education to a subgroup of predicted responders would not be cost-effective. . The psychological prediction models tested did not have sufficient predictive power to make targeting treatment cost-effective. The psychological prediction models are simple linear models of psychological behavior. Collection of data on additional covariates could potentially increase statistical power. . By collecting data on psychological variables before an intervention, we can construct predictive models of treatment response to interventions. These predictive models can be incorporated into health economic models to investigate more complex service delivery and reimbursement strategies. © The Author(s) 2015.
TargetSpy: a supervised machine learning approach for microRNA target prediction.

PubMed

Sturm, Martin; Hackenberg, Michael; Langenberger, David; Frishman, Dmitrij

2010-05-28

Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.
TargetSpy: a supervised machine learning approach for microRNA target prediction

PubMed Central

2010-01-01

Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org. PMID:20509939
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

PubMed Central

MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

2017-01-01

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms. PMID:28979308
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR.

PubMed

MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

2017-01-01

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.
Application of dynamic recurrent neural networks in nonlinear system identification

NASA Astrophysics Data System (ADS)

Du, Yun; Wu, Xueli; Sun, Huiqin; Zhang, Suying; Tian, Qiang

2006-11-01

An adaptive identification method of simple dynamic recurrent neural network (SRNN) for nonlinear dynamic systems is presented in this paper. This method based on the theory that by using the inner-states feed-back of dynamic network to describe the nonlinear kinetic characteristics of system can reflect the dynamic characteristics more directly, deduces the recursive prediction error (RPE) learning algorithm of SRNN, and improves the algorithm by studying topological structure on recursion layer without the weight values. The simulation results indicate that this kind of neural network can be used in real-time control, due to its less weight values, simpler learning algorithm, higher identification speed, and higher precision of model. It solves the problems of intricate in training algorithm and slow rate in convergence caused by the complicate topological structure in usual dynamic recurrent neural network.
Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia.

PubMed

Sivley, R Michael; Sheehan, Jonathan H; Kropski, Jonathan A; Cogan, Joy; Blackwell, Timothy S; Phillips, John A; Bush, William S; Meiler, Jens; Capra, John A

2018-01-23

Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease. To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). We curated pathogenic and neutral RTEL1 variants from the literature and public databases. We then used homology modeling to construct a 3D structural model of RTEL1 and mapped known variants into this structure. We next developed a pathogenicity prediction algorithm based on proximity to known disease causing and neutral variants and evaluated its performance with leave-one-out cross-validation. We further validated our predictions with segregation analyses, telomere lengths, and mutagenesis data from the homologous XPD protein. Our algorithm for classifying RTEL1 VUS based on spatial proximity to pathogenic and neutral variation accurately distinguished 7 known pathogenic from 29 neutral variants (ROC AUC = 0.85) in the N-terminal domains of RTEL1. Pathogenic proximity scores were also significantly correlated with effects on ATPase activity (Pearson r = -0.65, p = 0.0004) in XPD, a related helicase. Applying the algorithm to 13 VUS identified from sequencing of RTEL1 from patients predicted five out of six disease-segregating VUS to be pathogenic. We provide structural hypotheses regarding how these mutations may disrupt RTEL1 ATPase and helicase function. Spatial analysis of missense variation accurately classified candidate VUS in RTEL1 and suggests how such variants cause disease. Incorporating spatial proximity analyses into other pathogenicity prediction tools may improve accuracy for other genes and genetic diseases.
A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction

PubMed Central

2013-01-01

Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions. PMID:24565020
Efficient and accurate Greedy Search Methods for mining functional modules in protein interaction networks.

PubMed

He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei

2012-06-25

Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.
A semi-supervised learning approach for RNA secondary structure prediction.

PubMed

Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki

2015-08-01

RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.
Prediction of TF target sites based on atomistic models of protein-DNA complexes

PubMed Central

Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

2008-01-01

Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190
Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management.

PubMed

Hassett, Michael J; Uno, Hajime; Cronin, Angel M; Carroll, Nikki M; Hornbrook, Mark C; Ritzwoller, Debra

2017-12-01

Recurrent cancer is common, costly, and lethal, yet we know little about it in community-based populations. Electronic health records and tumor registries contain vast amounts of data regarding community-based patients, but usually lack recurrence status. Existing algorithms that use structured data to detect recurrence have limitations. We developed algorithms to detect the presence and timing of recurrence after definitive therapy for stages I-III lung and colorectal cancer using 2 data sources that contain a widely available type of structured data (claims or electronic health record encounters) linked to gold-standard recurrence status: Medicare claims linked to the Cancer Care Outcomes Research and Surveillance study, and the Cancer Research Network Virtual Data Warehouse linked to registry data. Twelve potential indicators of recurrence were used to develop separate models for each cancer in each data source. Detection models maximized area under the ROC curve (AUC); timing models minimized average absolute error. Algorithms were compared by cancer type/data source, and contrasted with an existing binary detection rule. Detection model AUCs (>0.92) exceeded existing prediction rules. Timing models yielded absolute prediction errors that were small relative to follow-up time (<15%). Similar covariates were included in all detection and timing algorithms, though differences by cancer type and dataset challenged efforts to create 1 common algorithm for all scenarios. Valid and reliable detection of recurrence using big data is feasible. These tools will enable extensive, novel research on quality, effectiveness, and outcomes for lung and colorectal cancer patients and those who develop recurrence.
Developing a New Computer-Aided Clinical Decision Support System for Prediction of Successful Postcardioversion Patients with Persistent Atrial Fibrillation.

PubMed

Sterling, Mark; Huang, David T; Ghoraani, Behnaz

2015-01-01

We propose a new algorithm to predict the outcome of direct-current electric (DCE) cardioversion for atrial fibrillation (AF) patients. AF is the most common cardiac arrhythmia and DCE cardioversion is a noninvasive treatment to end AF and return the patient to sinus rhythm (SR). Unfortunately, there is a high risk of AF recurrence in persistent AF patients; hence clinically it is important to predict the DCE outcome in order to avoid the procedure's side effects. This study develops a feature extraction and classification framework to predict AF recurrence patients from the underlying structure of atrial activity (AA). A multiresolution signal decomposition technique, based on matching pursuit (MP), was used to project the AA over a dictionary of wavelets. Seven novel features were derived from the decompositions and were employed in a quadratic discrimination analysis classification to predict the success of post-DCE cardioversion in 40 patients with persistent AF. The proposed algorithm achieved 100% sensitivity and 95% specificity, indicating that the proposed computational approach captures detailed structural information about the underlying AA and could provide reliable information for effective management of AF.
Data-Driven Property Estimation for Protective Clothing

DTIC Science & Technology

2014-09-01

reliable predictions falls under the rubric “machine learning”. Inspired by the applications of machine learning in pharmaceutical drug design and...using genetic algorithms, for instance— descriptor selection can be automated as well. A well-known structured learning technique—Artificial Neural...descriptors automatically, by iteration, e.g., using a genetic algorithm [49]. 4.2.4 Avoiding Overfitting A peril of all regression—least squares as
Developing a novel hierarchical approach for multiscale structural reliability predictions for ultra-high consequence applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Emery, John M.; Coffin, Peter; Robbins, Brian A.

Microstructural variabilities are among the predominant sources of uncertainty in structural performance and reliability. We seek to develop efficient algorithms for multiscale calcu- lations for polycrystalline alloys such as aluminum alloy 6061-T6 in environments where ductile fracture is the dominant failure mode. Our approach employs concurrent multiscale methods, but does not focus on their development. They are a necessary but not sufficient ingredient to multiscale reliability predictions. We have focused on how to efficiently use concurrent models for forward propagation because practical applications cannot include fine-scale details throughout the problem domain due to exorbitant computational demand. Our approach begins withmore » a low-fidelity prediction at the engineering scale that is sub- sequently refined with multiscale simulation. The results presented in this report focus on plasticity and damage at the meso-scale, efforts to expedite Monte Carlo simulation with mi- crostructural considerations, modeling aspects regarding geometric representation of grains and second-phase particles, and contrasting algorithms for scale coupling.« less

Free energy minimization to predict RNA secondary structures and computational RNA design.

PubMed

Churkin, Alexander; Weinbrand, Lina; Barash, Danny

2015-01-01

Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.
New correction procedures for the fast field program which extend its range

NASA Technical Reports Server (NTRS)

West, M.; Sack, R. A.

1990-01-01

A fast field program (FFP) algorithm was developed based on the method of Lee et al., for the prediction of sound pressure level from low frequency, high intensity sources. In order to permit accurate predictions at distances greater than 2 km, new correction procedures have had to be included in the algorithm. Certain functions, whose Hankel transforms can be determined analytically, are subtracted from the depth dependent Green's function. The distance response is then obtained as the sum of these transforms and the Fast Fourier Transformation (FFT) of the residual k dependent function. One procedure, which permits the elimination of most complex exponentials, has allowed significant changes in the structure of the FFP algorithm, which has resulted in a substantial reduction in computation time.
How evolutionary crystal structure prediction works--and why.

PubMed

Oganov, Artem R; Lyakhov, Andriy O; Valle, Mario

2011-03-15

Once the crystal structure of a chemical substance is known, many properties can be predicted reliably and routinely. Therefore if researchers could predict the crystal structure of a material before it is synthesized, they could significantly accelerate the discovery of new materials. In addition, the ability to predict crystal structures at arbitrary conditions of pressure and temperature is invaluable for the study of matter at extreme conditions, where experiments are difficult. Crystal structure prediction (CSP), the problem of finding the most stable arrangement of atoms given only the chemical composition, has long remained a major unsolved scientific problem. Two problems are entangled here: search, the efficient exploration of the multidimensional energy landscape, and ranking, the correct calculation of relative energies. For organic crystals, which contain a few molecules in the unit cell, search can be quite simple as long as a researcher does not need to include many possible isomers or conformations of the molecules; therefore ranking becomes the main challenge. For inorganic crystals, quantum mechanical methods often provide correct relative energies, making search the most critical problem. Recent developments provide useful practical methods for solving the search problem to a considerable extent. One can use simulated annealing, metadynamics, random sampling, basin hopping, minima hopping, and data mining. Genetic algorithms have been applied to crystals since 1995, but with limited success, which necessitated the development of a very different evolutionary algorithm. This Account reviews CSP using one of the major techniques, the hybrid evolutionary algorithm USPEX (Universal Structure Predictor: Evolutionary Xtallography). Using recent developments in the theory of energy landscapes, we unravel the reasons evolutionary techniques work for CSP and point out their limitations. We demonstrate that the energy landscapes of chemical systems have an overall shape and explore their intrinsic dimensionalities. Because of the inverse relationships between order and energy and between the dimensionality and diversity of an ensemble of crystal structures, the chances that a random search will find the ground state decrease exponentially with increasing system size. A well-designed evolutionary algorithm allows for much greater computational efficiency. We illustrate the power of evolutionary CSP through applications that examine matter at high pressure, where new, unexpected phenomena take place. Evolutionary CSP has allowed researchers to make unexpected discoveries such as a transparent phase of sodium, a partially ionic form of boron, complex superconducting forms of calcium, a novel superhard allotrope of carbon, polymeric modifications of nitrogen, and a new class of compounds, perhydrides. These methods have also led to the discovery of novel hydride superconductors including the "impossible" LiH(n) (n=2, 6, 8) compounds, and CaLi(2). We discuss extensions of the method to molecular crystals, systems of variable composition, and the targeted optimization of specific physical properties. © 2011 American Chemical Society
Structural assembly in space

NASA Technical Reports Server (NTRS)

Stokes, J. W.; Pruett, E. C.

1980-01-01

A cost algorithm for predicting assembly costs for large space structures is given. Assembly scenarios are summarized which describe the erection, deployment, and fabrication tasks for five large space structures. The major activities that impact total costs for structure assembly from launch through deployment and assembly to scientific instrument installation and checkout are described. Individual cost elements such as assembly fixtures, handrails, or remote minipulators are also presented.
The Third Ambient Aspirin Polymorph

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shtukenberg, Alexander G.; Hu, Chunhua T.; Zhu, Qiang

Polymorphism in aspirin (acetylsalicylic acid), one of the most widely consumed medications, was equivocal until the structure of a second polymorph II, similar in structure to the original form I, was reported in 2005. Here, the third ambient polymorph of aspirin is described. Lastly, it was crystallized from the melt and its structure was determined using a combination of X-ray powder diffraction analysis and crystal structure prediction algorithms.
The Third Ambient Aspirin Polymorph

DOE PAGES

Shtukenberg, Alexander G.; Hu, Chunhua T.; Zhu, Qiang; ...

2017-05-17

Polymorphism in aspirin (acetylsalicylic acid), one of the most widely consumed medications, was equivocal until the structure of a second polymorph II, similar in structure to the original form I, was reported in 2005. Here, the third ambient polymorph of aspirin is described. Lastly, it was crystallized from the melt and its structure was determined using a combination of X-ray powder diffraction analysis and crystal structure prediction algorithms.
RandSpg: An open-source program for generating atomistic crystal structures with specific spacegroups

NASA Astrophysics Data System (ADS)

Avery, Patrick; Zurek, Eva

2017-04-01

A new algorithm, RANDSPG, that can be used to generate trial crystal structures with specific space groups and compositions is described. The program has been designed for systems where the atoms are independent of one another, and it is therefore primarily suited towards inorganic systems. The structures that are generated adhere to user-defined constraints such as: the lattice shape and size, stoichiometry, set of space groups to be generated, and factors that influence the minimum interatomic separations. In addition, the user can optionally specify if the most general Wyckoff position is to be occupied or constrain select atoms to specific Wyckoff positions. Extensive testing indicates that the algorithm is efficient and reliable. The library is lightweight, portable, dependency-free and is published under a license recognized by the Open Source Initiative. A web interface for the algorithm is publicly accessible at http://xtalopt.openmolecules.net/randSpg/randSpg.html. RANDSPG has also been interfaced with the XTALOPT evolutionary algorithm for crystal structure prediction, and it is illustrated that the use of symmetric lattices in the first generation of randomly created individuals decreases the number of structures that need to be optimized to find the global energy minimum.
Using support vector machine to predict beta- and gamma-turns in proteins.

PubMed

Hu, Xiuzhen; Li, Qianzhong

2008-09-01

By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.
Application of artificial neural network to predict clay sensitivity in a high landslide prone area using CPTu data- A case study in Southwest of Sweden

NASA Astrophysics Data System (ADS)

Shahri, Abbas; Mousavinaseri, Mahsasadat; Naderi, Shima; Espersson, Maria

2015-04-01

Application of Artificial Neural Networks (ANNs) in many areas of engineering, in particular to geotechnical engineering problems such as site characterization has demonstrated some degree of success. The present paper aims to evaluate the feasibility of several various types of ANN models to predict the clay sensitivity of soft clays form piezocone penetration test data (CPTu). To get the aim, a research database of CPTu data of 70 test points around the Göta River near the Lilli Edet in the southwest of Sweden which is a high prone land slide area were collected and considered as input for ANNs. For training algorithms the quick propagation, conjugate gradient descent, quasi-Newton, limited memory quasi-Newton and Levenberg-Marquardt were developed tested and trained using the CPTu data to provide a comparison between the results of field investigation and ANN models to estimate the clay sensitivity. The reason of using the clay sensitivity parameter in this study is due to its relation to landslides in Sweden.A special high sensitive clay namely quick clay is considered as the main responsible for experienced landslides in Sweden which has high sensitivity and prone to slide. The training and testing program was started with 3-2-1 ANN architecture structure. By testing and trying several various architecture structures and changing the hidden layer in order to have a higher output resolution the 3-4-4-3-1 architecture structure for ANN in this study was confirmed. The tested algorithm showed that increasing the hidden layers up to 4 layers in ANN can improve the results and the 3-4-4-3-1 architecture structure ANNs for prediction of clay sensitivity represent reliable and reasonable response. The obtained results showed that the conjugate gradient descent algorithm with R2=0.897 has the best performance among the tested algorithms. Keywords: clay sensitivity, landslide, Artificial Neural Network
Collaborative development of predictive toxicology applications

PubMed Central

2010-01-01

OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals. The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation. Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way. PMID:20807436
Collaborative development of predictive toxicology applications.

PubMed

Hardy, Barry; Douglas, Nicki; Helma, Christoph; Rautenberg, Micha; Jeliazkova, Nina; Jeliazkov, Vedrin; Nikolova, Ivelina; Benigni, Romualdo; Tcheremenskaia, Olga; Kramer, Stefan; Girschick, Tobias; Buchwald, Fabian; Wicker, Joerg; Karwath, Andreas; Gütlein, Martin; Maunz, Andreas; Sarimveis, Haralambos; Melagraki, Georgia; Afantitis, Antreas; Sopasakis, Pantelis; Gallagher, David; Poroikov, Vladimir; Filimonov, Dmitry; Zakharov, Alexey; Lagunin, Alexey; Gloriozova, Tatyana; Novikov, Sergey; Skvortsova, Natalia; Druzhilovsky, Dmitry; Chawla, Sunil; Ghosh, Indira; Ray, Surajit; Patel, Hitesh; Escher, Sylvia

2010-08-31

OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.
A serendipitous survey of prediction algorithms for amyloidogenicity

PubMed Central

Roland, Bartholomew P.; Kodali, Ravindra; Mishra, Rakesh; Wetzel, Ronald

2014-01-01

SUMMARY The 17- amino acid N-terminal segment of the Huntingtin protein, httNT, grows into stable α-helix rich oligomeric aggregates when incubated under physiological conditions. We examined 15 scrambled sequence versions of an httNT peptide for their stabilities against aggregation in aqueous solution at low micromolar concentration and physiological conditions. Surprisingly, given their derivation from a sequence that readily assembles into highly stable α-helical aggregates that fail to convert into β-structure, we found that three of these scrambled peptides rapidly grow into amyloid-like fibrils, while two others also develop amyloid somewhat more slowly. The other 10 scrambled peptides do not detectibly form any aggregates after 100 hrs incubation under these conditions. We then analyzed these sequences using four previously described algorithms for predicting the tendencies of peptides to grow into amyloid or other β-aggregates. We found that these algorithms – Zyggregator, Tango, Waltz and Zipper – varied greatly in the number of sequences predicted to be amyloidogenic and in their abilities to correctly identify the amyloid forming members of scrambled peptide collection. The results are discussed in the context of a review of the sequence and structural factors currently thought to be important in determining amyloid formation kinetics and thermodynamics. PMID:23893755
Application of firefly algorithm to the dynamic model updating problem

NASA Astrophysics Data System (ADS)

Shabbir, Faisal; Omenzetter, Piotr

2015-04-01

Model updating can be considered as a branch of optimization problems in which calibration of the finite element (FE) model is undertaken by comparing the modal properties of the actual structure with these of the FE predictions. The attainment of a global solution in a multi dimensional search space is a challenging problem. The nature-inspired algorithms have gained increasing attention in the previous decade for solving such complex optimization problems. This study applies the novel Firefly Algorithm (FA), a global optimization search technique, to a dynamic model updating problem. This is to the authors' best knowledge the first time FA is applied to model updating. The working of FA is inspired by the flashing characteristics of fireflies. Each firefly represents a randomly generated solution which is assigned brightness according to the value of the objective function. The physical structure under consideration is a full scale cable stayed pedestrian bridge with composite bridge deck. Data from dynamic testing of the bridge was used to correlate and update the initial model by using FA. The algorithm aimed at minimizing the difference between the natural frequencies and mode shapes of the structure. The performance of the algorithm is analyzed in finding the optimal solution in a multi dimensional search space. The paper concludes with an investigation of the efficacy of the algorithm in obtaining a reference finite element model which correctly represents the as-built original structure.
Mixed time integration methods for transient thermal analysis of structures, appendix 5

NASA Technical Reports Server (NTRS)

Liu, W. K.

1982-01-01

Mixed time integration methods for transient thermal analysis of structures are studied. An efficient solution procedure for predicting the thermal behavior of aerospace vehicle structures was developed. A 2D finite element computer program incorporating these methodologies is being implemented. The performance of these mixed time finite element algorithms can then be evaluated employing the proposed example problem.
Protein Loop Structure Prediction Using Conformational Space Annealing.

PubMed

Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung

2017-05-22

We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.
Ternary alloy material prediction using genetic algorithm and cluster expansion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chong

2015-12-01

This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we didmore » our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe 3VSi 2 is a new stable phase and it can be very inspiring to the future experiments.« less
CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles.

PubMed

Nielsen, Morten; Lundegaard, Claus; Lund, Ole; Petersen, Thomas Nordahl

2010-07-01

CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 A when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 A. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.
RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme

PubMed Central

Biesiada, Marcin; Boniecki, Michał J.; Chou, Fang-Chieh; Ferré-D'Amaré, Adrian R.; Das, Rhiju; Dunin-Horkawicz, Stanisław; Geniesse, Caleb; Kappel, Kalli; Kladwang, Wipapat; Krokhotin, Andrey; Łach, Grzegorz E.; Major, François; Mann, Thomas H.; Pachulska-Wieczorek, Katarzyna; Patel, Dinshaw J.; Piccirilli, Joseph A.; Popenda, Mariusz; Purzycka, Katarzyna J.; Ren, Aiming; Rice, Greggory M.; Santalucia, John; Tandon, Arpit; Trausch, Jeremiah J.; Wang, Jian; Weeks, Kevin M.; Williams, Benfeard; Xiao, Yi; Zhang, Dong; Zok, Tomasz

2017-01-01

RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5′-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson–Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/. PMID:28138060
A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.

PubMed

Edvardsson, Sverker; Gardner, Paul P; Poole, Anthony M; Hendy, Michael D; Penny, David; Moulton, Vincent

2003-05-01

Noncoding RNA genes produce functional RNA molecules rather than coding for proteins. One such family is the H/ACA snoRNAs. Unlike the related C/D snoRNAs these have resisted automated detection to date. We develop an algorithm to screen the yeast genome for novel H/ACA snoRNAs. To achieve this, we introduce some new methods for facilitating the search for noncoding RNAs in genomic sequences which are based on properties of predicted minimum free-energy (MFE) secondary structures. The algorithm has been implemented and can be generalized to enable screening of other eukaryote genomes. We find that use of primary sequence alone is insufficient for identifying novel H/ACA snoRNAs. Only the use of secondary structure filters reduces the number of candidates to a manageable size. From genomic context, we identify three strong H/ACA snoRNA candidates. These together with a further 47 candidates obtained by our analysis are being experimentally screened.
A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments

PubMed Central

Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

2016-01-01

The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure–function relationship. PMID:26978354

Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

PubMed

Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

2017-10-25

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Prediction of Metastasis Using Second Harmonic Generation

DTIC Science & Technology

2016-07-01

extracellular matrix through which metastasizing cells must travel. We and others have demonstrated that tumor collagen structure, as measured with the...algorithm using separate training and validation sets, etc. Keywords: metastasis, overtreatment, extracellular matrix , collagen , second harmonic...optical process called second harmonic generation (SHG), influences tumor metastasis. This suggests that collagen structure may provide prognostic
Large-scale structure prediction by improved contact predictions and model quality assessment.

PubMed

Michel, Mirco; Menéndez Hurtado, David; Uziela, Karolis; Elofsson, Arne

2017-07-15

Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/ . All programs used here are freely available. arne@bioinfo.se. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches

PubMed Central

Senter, Evan; Sheikh, Saad; Dotu, Ivan; Ponty, Yann; Clote, Peter

2012-01-01

Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/. PMID:23284639
Computational design of chimeric protein libraries for directed evolution.

PubMed

Silberg, Jonathan J; Nguyen, Peter Q; Stevenson, Taylor

2010-01-01

The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly related proteins are recombined, the fraction of variants having a disrupted structure increases. One way to enrich the fraction of folded and potentially interesting chimeras in these libraries is to use computational algorithms to anticipate which structural elements can be swapped without disturbing the integrity of a protein's structure. Herein, we describe how the algorithm Schema uses the sequences and structures of the parent proteins recombined to predict the structural disruption of chimeras, and we outline how dynamic programming can be used to find libraries with a range of amino acid substitution levels that are enriched in variants with low Schema disruption.
Evicase: an evidence-based case structuring approach for personalized healthcare.

PubMed

Carmeli, Boaz; Casali, Paolo; Goldbraich, Anna; Goldsteen, Abigail; Kent, Carmel; Licitra, Lisa; Locatelli, Paolo; Restifo, Nicola; Rinott, Ruty; Sini, Elena; Torresani, Michele; Waks, Zeev

2012-01-01

The personalized medicine era stresses a growing need to combine evidence-based medicine with case based reasoning in order to improve the care process. To address this need we suggest a framework to generate multi-tiered statistical structures we call Evicases. Evicase integrates established medical evidence together with patient cases from the bedside. It then uses machine learning algorithms to produce statistical results and aggregators, weighted predictions, and appropriate recommendations. Designed as a stand-alone structure, Evicase can be used for a range of decision support applications including guideline adherence monitoring and personalized prognostic predictions.
Prediction of Unsteady Aerodynamic Coefficients at High Angles of Attack

NASA Technical Reports Server (NTRS)

Pamadi, Bandu N.; Murphy, Patrick C.; Klein, Vladislav; Brandon, Jay M.

2001-01-01

The nonlinear indicial response method is used to model the unsteady aerodynamic coefficients in the low speed longitudinal oscillatory wind tunnel test data of the 0.1 scale model of the F-16XL aircraft. Exponential functions are used to approximate the deficiency function in the indicial response. Using one set of oscillatory wind tunnel data and parameter identification method, the unknown parameters in the exponential functions are estimated. The genetic algorithm is used as a least square minimizing algorithm. The assumed model structures and parameter estimates are validated by comparing the predictions with other sets of available oscillatory wind tunnel test data.
ProBiS tools (algorithm, database, and web servers) for predicting and modeling of biologically interesting proteins.

PubMed

Konc, Janez; Janežič, Dušanka

2017-09-01

ProBiS (Protein Binding Sites) Tools consist of algorithm, database, and web servers for prediction of binding sites and protein ligands based on the detection of structurally similar binding sites in the Protein Data Bank. In this article, we review the operations that ProBiS Tools perform, provide comments on the evolution of the tools, and give some implementation details. We review some of its applications to biologically interesting proteins. ProBiS Tools are freely available at http://probis.cmm.ki.si and http://probis.nih.gov. Copyright © 2017 Elsevier Ltd. All rights reserved.
Extending RosettaDock with water, sugar, and pH for prediction of complex structures and affinities for CAPRI rounds 20-27.

PubMed

Kilambi, Krishna Praneeth; Pacella, Michael S; Xu, Jianqing; Labonte, Jason W; Porter, Justin R; Muthu, Pravin; Drew, Kevin; Kuroda, Daisuke; Schueler-Furman, Ora; Bonneau, Richard; Gray, Jeffrey J

2013-12-01

Rounds 20-27 of the Critical Assessment of PRotein Interactions (CAPRI) provided a testing platform for computational methods designed to address a wide range of challenges. The diverse targets drove the creation of and new combinations of computational tools. In this study, RosettaDock and other novel Rosetta protocols were used to successfully predict four of the 10 blind targets. For example, for DNase domain of Colicin E2-Im2 immunity protein, RosettaDock and RosettaLigand were used to predict the positions of water molecules at the interface, recovering 46% of the native water-mediated contacts. For α-repeat Rep4-Rep2 and g-type lysozyme-PliG inhibitor complexes, homology models were built and standard and pH-sensitive docking algorithms were used to generate structures with interface RMSD values of 3.3 Å and 2.0 Å, respectively. A novel flexible sugar-protein docking protocol was also developed and used for structure prediction of the BT4661-heparin-like saccharide complex, recovering 71% of the native contacts. Challenges remain in the generation of accurate homology models for protein mutants and sampling during global docking. On proteins designed to bind influenza hemagglutinin, only about half of the mutations were identified that affect binding (T55: 54%; T56: 48%). The prediction of the structure of the xylanase complex involving homology modeling and multidomain docking pushed the limits of global conformational sampling and did not result in any successful prediction. The diversity of problems at hand requires computational algorithms to be versatile; the recent additions to the Rosetta suite expand the capabilities to encompass more biologically realistic docking problems. Copyright © 2013 Wiley Periodicals, Inc.
RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins.

PubMed

Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E

2018-05-09

RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives.

PubMed

Meinhardt, Sarah; Swint-Kruse, Liskin

2008-12-01

In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.
Multi-label literature classification based on the Gene Ontology graph.

PubMed

Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

2008-12-08

The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
Implementation of a combined algorithm designed to increase the reliability of information systems: simulation modeling

NASA Astrophysics Data System (ADS)

Popov, A.; Zolotarev, V.; Bychkov, S.

2016-11-01

This paper examines the results of experimental studies of a previously submitted combined algorithm designed to increase the reliability of information systems. The data that illustrates the organization and conduct of the studies is provided. Within the framework of a comparison of As a part of the study conducted, the comparison of the experimental data of simulation modeling and the data of the functioning of the real information system was made. The hypothesis of the homogeneity of the logical structure of the information systems was formulated, thus enabling to reconfigure the algorithm presented, - more specifically, to transform it into the model for the analysis and prediction of arbitrary information systems. The results presented can be used for further research in this direction. The data of the opportunity to predict the functioning of the information systems can be used for strategic and economic planning. The algorithm can be used as a means for providing information security.
Tunable thermal conductivity via domain structure engineering in ferroelectric thin films: A phase-field simulation

DOE PAGES

Wang, Jian -Jun; Wang, Yi; Ihlefeld, Jon F.; ...

2016-04-06

Effective thermal conductivity as a function of domain structure is studied by solving the heat conduction equation using a spectral iterative perturbation algorithm in materials with inhomogeneous thermal conductivity distribution. Using this proposed algorithm, the experimentally measured effective thermal conductivities of domain-engineered {001} p-BiFeO 3 thin films are quantitatively reproduced. In conjunction with two other testing examples, this proposed algorithm is proven to be an efficient tool for interpreting the relationship between the effective thermal conductivity and micro-/domain-structures. By combining this algorithm with the phase-field model of ferroelectric thin films, the effective thermal conductivity for PbZr 1-xTi xO 3 filmsmore » under different composition, thickness, strain, and working conditions is predicted. It is shown that the chemical composition, misfit strain, film thickness, film orientation, and a Piezoresponse Force Microscopy tip can be used to engineer the domain structures and tune the effective thermal conductivity. Furthermore, we expect our findings will stimulate future theoretical, experimental and engineering efforts on developing devices based on the tunable effective thermal conductivity in ferroelectric nanostructures.« less
Tunable thermal conductivity via domain structure engineering in ferroelectric thin films: A phase-field simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Jian -Jun; Wang, Yi; Ihlefeld, Jon F.

Effective thermal conductivity as a function of domain structure is studied by solving the heat conduction equation using a spectral iterative perturbation algorithm in materials with inhomogeneous thermal conductivity distribution. Using this proposed algorithm, the experimentally measured effective thermal conductivities of domain-engineered {001} p-BiFeO 3 thin films are quantitatively reproduced. In conjunction with two other testing examples, this proposed algorithm is proven to be an efficient tool for interpreting the relationship between the effective thermal conductivity and micro-/domain-structures. By combining this algorithm with the phase-field model of ferroelectric thin films, the effective thermal conductivity for PbZr 1-xTi xO 3 filmsmore » under different composition, thickness, strain, and working conditions is predicted. It is shown that the chemical composition, misfit strain, film thickness, film orientation, and a Piezoresponse Force Microscopy tip can be used to engineer the domain structures and tune the effective thermal conductivity. Furthermore, we expect our findings will stimulate future theoretical, experimental and engineering efforts on developing devices based on the tunable effective thermal conductivity in ferroelectric nanostructures.« less
Which method of posttraumatic stress disorder classification best predicts psychosocial function in children with traumatic brain injury?

PubMed

Iselin, Greg; Le Brocque, Robyne; Kenardy, Justin; Anderson, Vicki; McKinlay, Lynne

2010-10-01

Controversy surrounds the classification of posttraumatic stress disorder (PTSD), particularly in children and adolescents with traumatic brain injury (TBI). In these populations, it is difficult to differentiate TBI-related organic memory loss from dissociative amnesia. Several alternative PTSD classification algorithms have been proposed for use with children. This paper investigates DSM-IV-TR and alternative PTSD classification algorithms, including and excluding the dissociative amnesia item, in terms of their ability to predict psychosocial function following pediatric TBI. A sample of 184 children aged 6-14 years were recruited following emergency department presentation and/or hospital admission for TBI. PTSD was assessed via semi-structured clinical interview (CAPS-CA) with the child at 3 months post-injury. Psychosocial function was assessed using the parent report CHQ-PF50. Two alternative classification algorithms, the PTSD-AA and 2 of 3 algorithms, reached statistical significance. While the inclusion of the dissociative amnesia item increased prevalence rates across algorithms, it generally resulted in weaker associations with psychosocial function. The PTSD-AA algorithm appears to have the strongest association with psychosocial function following TBI in children and adolescents. Removing the dissociative amnesia item from the diagnostic algorithm generally results in improved validity. Copyright 2010 Elsevier Ltd. All rights reserved.
regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution.

PubMed

Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong

2017-09-01

While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.
Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

PubMed Central

Xu, Dong; Zhang, Yang

2013-01-01

Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
Hearing through the noise: Biologically inspired noise reduction

NASA Astrophysics Data System (ADS)

Lee, Tyler Paul

Vocal communication in the natural world demands that a listener perform a remarkably complicated task in real-time. Vocalizations mix with all other sounds in the environment as they travel to the listener, arriving as a jumbled low-dimensional signal. A listener must then use this signal to extract the structure corresponding to individual sound sources. How this computation is implemented in the brain remains poorly understood, yet an accurate description of such mechanisms would impact a variety of medical and technological applications of sound processing. In this thesis, I describe initial work on how neurons in the secondary auditory cortex of the Zebra Finch extract song from naturalistic background noise. I then build on our understanding of the function of these neurons by creating an algorithm that extracts speech from natural background noise using spectrotemporal modulations. The algorithm, implemented as an artificial neural network, can be flexibly applied to any class of signal or noise and performs better than an optimal frequency-based noise reduction algorithm for a variety of background noises and signal-to-noise ratios. One potential drawback to using spectrotemporal modulations for noise reduction, though, is that analyzing the modulations present in an ongoing sound requires a latency set by the slowest temporal modulation computed. The algorithm avoids this problem by reducing noise predictively, taking advantage of the large amount of temporal structure present in natural sounds. This predictive denoising has ties to recent work suggesting that the auditory system uses attention to focus on predicted regions of spectrotemporal space when performing auditory scene analysis.
Predicting the transmembrane secondary structure of ligand-gated ion channels.

PubMed

Bertaccini, E; Trudell, J R

2002-06-01

Recent mutational analyses of ligand-gated ion channels (LGICs) have demonstrated a plausible site of anesthetic action within their transmembrane domains. Although there is a consensus that the transmembrane domain is formed from four membrane-spanning segments, the secondary structure of these segments is not known. We utilized 10 state-of-the-art bioinformatics techniques to predict the transmembrane topology of the tetrameric regions within six members of the LGIC family that are relevant to anesthetic action. They are the human forms of the GABA alpha 1 receptor, the glycine alpha 1 receptor, the 5HT3 serotonin receptor, the nicotinic AChR alpha 4 and alpha 7 receptors and the Torpedo nAChR alpha 1 receptor. The algorithms utilized were HMMTOP, TMHMM, TMPred, PHDhtm, DAS, TMFinder, SOSUI, TMAP, MEMSAT and TOPPred2. The resulting predictions were superimposed on to a multiple sequence alignment of the six amino acid sequences created using the CLUSTAL W algorithm. There was a clear statistical consensus for the presence of four alpha helices in those regions experimentally thought to span the membrane. The consensus of 10 topology prediction techniques supports the hypothesis that the transmembrane subunits of the LGICs are tetrameric bundles of alpha helices.

Subsurface structural interpretation by applying trishear algorithm: An example from the Lenghu5 fold-and-thrust belt, Qaidam Basin, Northern Tibetan Plateau

NASA Astrophysics Data System (ADS)

Pei, Yangwen; Paton, Douglas A.; Wu, Kongyou; Xie, Liujuan

2017-08-01

The application of trishear algorithm, in which deformation occurs in a triangle zone in front of a propagating fault tip, is often used to understand fault related folding. In comparison to kink-band methods, a key characteristic of trishear algorithm is that non-uniform deformation within the triangle zone allows the layer thickness and horizon length to change during deformation, which is commonly observed in natural structures. An example from the Lenghu5 fold-and-thrust belt (Qaidam Basin, Northern Tibetan Plateau) is interpreted to help understand how to employ trishear forward modelling to improve the accuracy of seismic interpretation. High resolution fieldwork data, including high-angle dips, 'dragging structures', thinning hanging-wall and thickening footwall, are used to determined best-fit trishear model to explain the deformation happened to the Lenghu5 fold-and-thrust belt. We also consider the factors that increase the complexity of trishear models, including: (a) fault-dip changes and (b) pre-existing faults. We integrate fault dip change and pre-existing faults to predict subsurface structures that are apparently under seismic resolution. The analogue analysis by trishear models indicates that the Lenghu5 fold-and-thrust belt is controlled by an upward-steepening reverse fault above a pre-existing opposite-thrusting fault in deeper subsurface. The validity of the trishear model is confirmed by the high accordance between the model and the high-resolution fieldwork. The validated trishear forward model provides geometric constraints to the faults and horizons in the seismic section, e.g., fault cutoffs and fault tip position, faults' intersecting relationship and horizon/fault cross-cutting relationship. The subsurface prediction using trishear algorithm can significantly increase the accuracy of seismic interpretation, particularly in seismic sections with low signal/noise ratio.
Molecular descriptor subset selection in theoretical peptide quantitative structure-retention relationship model development using nature-inspired optimization algorithms.

PubMed

Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz

2015-10-06

In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.
Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers.

PubMed

Chen, Peng; Li, Jinyan

2010-05-17

Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
Uncertainty aggregation and reduction in structure-material performance prediction

NASA Astrophysics Data System (ADS)

Hu, Zhen; Mahadevan, Sankaran; Ao, Dan

2018-02-01

An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.
Computerized analysis of the 12-lead electrocardiogram to identify epicardial ventricular tachycardia exit sites.

PubMed

Yokokawa, Miki; Jung, Dae Yon; Joseph, Kim K; Hero, Alfred O; Morady, Fred; Bogun, Frank

2014-11-01

Twelve-lead electrocardiogram (ECG) criteria for epicardial ventricular tachycardia (VT) origins have been described. In patients with structural heart disease, the ability to predict an epicardial origin based on QRS morphology is limited and has been investigated only for limited regions in the heart. The purpose of this study was to determine whether a computerized algorithm is able to accurately differentiate epicardial vs endocardial origins of ventricular arrhythmias. Endocardial and epicardial pace-mapping were performed in 43 patients at 3277 sites. The 12-lead ECGs were digitized and analyzed using a mixture of gaussian model (MoG) to assess whether the algorithm was able to identify an epicardial vs endocardial origin of the paced rhythm. The MoG computerized algorithm was compared to algorithms published in prior reports. The computerized algorithm correctly differentiated epicardial vs endocardial pacing sites for 80% of the sites compared to an accuracy of 42% to 66% of other described criteria. The accuracy was higher in patients without structural heart disease than in those with structural heart disease (94% vs 80%, P = .0004) and for right bundle branch block (82%) compared to left bundle branch block morphologies (79%, P = .001). Validation studies showed the accuracy for VT exit sites to be 84%. A computerized algorithm was able to accurately differentiate the majority of epicardial vs endocardial pace-mapping sites. The algorithm is not region specific and performed best in patients without structural heart disease and with VTs having a right bundle branch block morphology. Copyright © 2014 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.
Multiple-Instance Regression with Structured Data

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

2008-01-01

We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Impact of mutations on the allosteric conformational equilibrium

PubMed Central

Weinkam, Patrick; Chen, Yao Chi; Pons, Jaume; Sali, Andrej

2012-01-01

Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction. PMID:23228330
Algorithm-Dependent Generalization Bounds for Multi-Task Learning.

PubMed

Liu, Tongliang; Tao, Dacheng; Song, Mingli; Maybank, Stephen J

2017-02-01

Often, tasks are collected for multi-task learning (MTL) because they share similar feature structures. Based on this observation, in this paper, we present novel algorithm-dependent generalization bounds for MTL by exploiting the notion of algorithmic stability. We focus on the performance of one particular task and the average performance over multiple tasks by analyzing the generalization ability of a common parameter that is shared in MTL. When focusing on one particular task, with the help of a mild assumption on the feature structures, we interpret the function of the other tasks as a regularizer that produces a specific inductive bias. The algorithm for learning the common parameter, as well as the predictor, is thereby uniformly stable with respect to the domain of the particular task and has a generalization bound with a fast convergence rate of order O(1/n), where n is the sample size of the particular task. When focusing on the average performance over multiple tasks, we prove that a similar inductive bias exists under certain conditions on the feature structures. Thus, the corresponding algorithm for learning the common parameter is also uniformly stable with respect to the domains of the multiple tasks, and its generalization bound is of the order O(1/T), where T is the number of tasks. These theoretical analyses naturally show that the similarity of feature structures in MTL will lead to specific regularizations for predicting, which enables the learning algorithms to generalize fast and correctly from a few examples.
Fast structure similarity searches among protein models: efficient clustering of protein fragments

PubMed Central

2012-01-01

Background For many predictive applications a large number of models is generated and later clustered in subsets based on structure similarity. In most clustering algorithms an all-vs-all root mean square deviation (RMSD) comparison is performed. Most of the time is typically spent on comparison of non-similar structures. For sets with more than, say, 10,000 models this procedure is very time-consuming and alternative faster algorithms, restricting comparisons only to most similar structures would be useful. Results We exploit the inverse triangle inequality on the RMSD between two structures given the RMSDs with a third structure. The lower bound on RMSD may be used, when restricting the search of similarity to a reasonably low RMSD threshold value, to speed up similarity searches significantly. Tests are performed on large sets of decoys which are widely used as test cases for predictive methods, with a speed-up of up to 100 times with respect to all-vs-all comparison depending on the set and parameters used. Sample applications are shown. Conclusions The algorithm presented here allows fast comparison of large data sets of structures with limited memory requirements. As an example of application we present clustering of more than 100000 fragments of length 5 from the top500H dataset into few hundred representative fragments. A more realistic scenario is provided by the search of similarity within the very large decoy sets used for the tests. Other applications regard filtering nearly-indentical conformation in selected CASP9 datasets and clustering molecular dynamics snapshots. Availability A linux executable and a Perl script with examples are given in the supplementary material (Additional file 1). The source code is available upon request from the authors. PMID:22642815
RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation.

PubMed

Li, Ying; Shi, Xiaohu; Liang, Yanchun; Xie, Juan; Zhang, Yu; Ma, Qin

2017-01-21

RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/ .
A variable structure fuzzy neural network model of squamous dysplasia and esophageal squamous cell carcinoma based on a global chaotic optimization algorithm.

PubMed

Moghtadaei, Motahareh; Hashemi Golpayegani, Mohammad Reza; Malekzadeh, Reza

2013-02-07

Identification of squamous dysplasia and esophageal squamous cell carcinoma (ESCC) is of great importance in prevention of cancer incidence. Computer aided algorithms can be very useful for identification of people with higher risks of squamous dysplasia, and ESCC. Such method can limit the clinical screenings to people with higher risks. Different regression methods have been used to predict ESCC and dysplasia. In this paper, a Fuzzy Neural Network (FNN) model is selected for ESCC and dysplasia prediction. The inputs to the classifier are the risk factors. Since the relation between risk factors in the tumor system has a complex nonlinear behavior, in comparison to most of ordinary data, the cost function of its model can have more local optimums. Thus the need for global optimization methods is more highlighted. The proposed method in this paper is a Chaotic Optimization Algorithm (COA) proceeding by the common Error Back Propagation (EBP) local method. Since the model has many parameters, we use a strategy to reduce the dependency among parameters caused by the chaotic series generator. This dependency was not considered in the previous COA methods. The algorithm is compared with logistic regression model as the latest successful methods of ESCC and dysplasia prediction. The results represent a more precise prediction with less mean and variance of error. Copyright © 2012 Elsevier Ltd. All rights reserved.
Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging.

PubMed

Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D; Joel, Suresh; Pekar, James J; Mostofsky, Stewart H; Caffo, Brian

2012-01-01

Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD.
Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging

PubMed Central

Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D.; Joel, Suresh; Pekar, James J.; Mostofsky, Stewart H.; Caffo, Brian

2012-01-01

Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD. PMID:22969709
Efficient and optimized identification of generalized Maxwell viscoelastic relaxation spectra

PubMed Central

Babaei, Behzad; Davarian, Ali; Pryse, Kenneth M.; Elson, Elliot L.; Genin, Guy M.

2017-01-01

Viscoelastic relaxation spectra are essential for predicting and interpreting the mechanical responses of materials and structures. For biological tissues, these spectra must usually be estimated from viscoelastic relaxation tests. Interpreting viscoelastic relaxation tests is challenging because the inverse problem is expensive computationally. We present here an efficient algorithm that enables rapid identification of viscoelastic relaxation spectra. The algorithm was tested against trial data to characterize its robustness and identify its limitations and strengths. The algorithm was then applied to identify the viscoelastic response of reconstituted collagen, revealing an extensive distribution of viscoelastic time constants. PMID:26523785
Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

PubMed

Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin

2013-08-01

Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.
A high-throughput approach to profile RNA structure.

PubMed

Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano

2017-03-17

Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Discovering new materials and new phenomena with evolutionary algorithms

NASA Astrophysics Data System (ADS)

Oganov, Artem

Thanks to powerful evolutionary algorithms, in particular the USPEX method, it is now possible to predict both the stable compounds and their crystal structures at arbitrary conditions, given just the set of chemical elements. Recent developments include major increases of efficiency and extensions to low-dimensional systems and molecular crystals (which allowed large structures to be handled easily, e.g. Mg(BH4)2 and H2O-H2) and new techniques called evolutionary metadynamics and Mendelevian search. Some of the results that I will discuss include: 1. Theoretical and experimental evidence for a new partially ionic phase of boron, γ-B and an insulating and optically transparent form of sodium. 2. Predicted stability of ``impossible'' chemical compounds that become stable under pressure - e.g. Na3Cl, Na2Cl, Na3Cl2, NaCl3, NaCl7, Mg3O2 and MgO2. 3. Novel surface phases (e.g. boron surface reconstructions). 4. Novel dielectric polymers, and novel permanent magnets confirmed by experiment and ready for applications. 5. Prediction of new ultrahard materials and computational proof that diamond is the hardest possible material.
RRCRank: a fusion method using rank strategy for residue-residue contact prediction.

PubMed

Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian

2017-09-02

In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.
Loads and aeroelasticity division research and technology accomplishments for FY 1983 and plans for FY 1984

NASA Technical Reports Server (NTRS)

Gardner, J. E.; Dixon, S. C.

1984-01-01

Research was done in the following areas: development and validation of solution algorithms, modeling techniques, integrated finite elements for flow-thermal-structural analysis and design, optimization of aircraft and spacecraft for the best performance, reduction of loads and increase in the dynamic structural stability of flexible airframes by the use of active control, methods for predicting steady and unsteady aerodynamic loads and aeroelastic characteristics of flight vehicles with emphasis on the transonic range, and methods for predicting and reducing helicoper vibrations.
Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network

PubMed Central

Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

2015-01-01

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

Minimalist ensemble algorithms for genome-wide protein localization prediction.

PubMed

Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

2012-07-03

Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction

PubMed Central

2012-01-01

Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

NASA Astrophysics Data System (ADS)

Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

2018-05-01

Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Hidden Markov model analysis of force/torque information in telemanipulation

NASA Technical Reports Server (NTRS)

Hannaford, Blake; Lee, Paul

1991-01-01

A model for the prediction and analysis of sensor information recorded during robotic performance of telemanipulation tasks is presented. The model uses the hidden Markov model to describe the task structure, the operator's or intelligent controller's goal structure, and the sensor signals. A methodology for constructing the model parameters based on engineering knowledge of the task is described. It is concluded that the model and its optimal state estimation algorithm, the Viterbi algorithm, are very succesful at the task of segmenting the data record into phases corresponding to subgoals of the task. The model provides a rich modeling structure within a statistical framework, which enables it to represent complex systems and be robust to real-world sensory signals.
Guaranteeing robustness of structural condition monitoring to environmental variability

NASA Astrophysics Data System (ADS)

Van Buren, Kendra; Reilly, Jack; Neal, Kyle; Edwards, Harry; Hemez, François

2017-01-01

Advances in sensor deployment and computational modeling have allowed significant strides to be recently made in the field of Structural Health Monitoring (SHM). One widely used SHM strategy is to perform a vibration analysis where a model of the structure's pristine (undamaged) condition is compared with vibration response data collected from the physical structure. Discrepancies between model predictions and monitoring data can be interpreted as structural damage. Unfortunately, multiple sources of uncertainty must also be considered in the analysis, including environmental variability, unknown model functional forms, and unknown values of model parameters. Not accounting for these sources of uncertainty can lead to false-positives or false-negatives in the structural condition assessment. To manage the uncertainty, we propose a robust SHM methodology that combines three technologies. A time series algorithm is trained using "baseline" data to predict the vibration response, compare predictions to actual measurements collected on a potentially damaged structure, and calculate a user-defined damage indicator. The second technology handles the uncertainty present in the problem. An analysis of robustness is performed to propagate this uncertainty through the time series algorithm and obtain the corresponding bounds of variation of the damage indicator. The uncertainty description and robustness analysis are both inspired by the theory of info-gap decision-making. Lastly, an appropriate "size" of the uncertainty space is determined through physical experiments performed in laboratory conditions. Our hypothesis is that examining how the uncertainty space changes throughout time might lead to superior diagnostics of structural damage as compared to only monitoring the damage indicator. This methodology is applied to a portal frame structure to assess if the strategy holds promise for robust SHM. (Publication approved for unlimited, public release on October-28-2015, LA-UR-15-28442, unclassified.)
A model predictive speed tracking control approach for autonomous ground vehicles

NASA Astrophysics Data System (ADS)

Zhu, Min; Chen, Huiyan; Xiong, Guangming

2017-03-01

This paper presents a novel speed tracking control approach based on a model predictive control (MPC) framework for autonomous ground vehicles. A switching algorithm without calibration is proposed to determine the drive or brake control. Combined with a simple inverse longitudinal vehicle model and adaptive regulation of MPC, this algorithm can make use of the engine brake torque for various driving conditions and avoid high frequency oscillations automatically. A simplified quadratic program (QP) solving algorithm is used to reduce the computational time, and the approach has been applied in a 16-bit microcontroller. The performance of the proposed approach is evaluated via simulations and vehicle tests, which were carried out in a range of speed-profile tracking tasks. With a well-designed system structure, high-precision speed control is achieved. The system can robustly model uncertainty and external disturbances, and yields a faster response with less overshoot than a PI controller.
Discovery of Novel HIV-1 Integrase Inhibitors Using QSAR-Based Virtual Screening of the NCI Open Database.

PubMed

Ko, Gene M; Garg, Rajni; Bailey, Barbara A; Kumar, Sunil

2016-01-01

Quantitative structure-activity relationship (QSAR) models can be used as a predictive tool for virtual screening of chemical libraries to identify novel drug candidates. The aims of this paper were to report the results of a study performed for descriptor selection, QSAR model development, and virtual screening for identifying novel HIV-1 integrase inhibitor drug candidates. First, three evolutionary algorithms were compared for descriptor selection: differential evolution-binary particle swarm optimization (DE-BPSO), binary particle swarm optimization, and genetic algorithms. Next, three QSAR models were developed from an ensemble of multiple linear regression, partial least squares, and extremely randomized trees models. A comparison of the performances of three evolutionary algorithms showed that DE-BPSO has a significant improvement over the other two algorithms. QSAR models developed in this study were used in consensus as a predictive tool for virtual screening of the NCI Open Database containing 265,242 compounds to identify potential novel HIV-1 integrase inhibitors. Six compounds were predicted to be highly active (plC50 > 6) by each of the three models. The use of a hybrid evolutionary algorithm (DE-BPSO) for descriptor selection and QSAR model development in drug design is a novel approach. Consensus modeling may provide better predictivity by taking into account a broader range of chemical properties within the data set conducive for inhibition that may be missed by an individual model. The six compounds identified provide novel drug candidate leads in the design of next generation HIV- 1 integrase inhibitors targeting drug resistant mutant viruses.
Navier-Stokes Dynamics by a Discrete Boltzmann Model

NASA Technical Reports Server (NTRS)

Rubinstein, Robet

2010-01-01

This work investigates the possibility of particle-based algorithms for the Navier-Stokes equations and higher order continuum approximations of the Boltzmann equation; such algorithms would generalize the well-known Pullin scheme for the Euler equations. One such method is proposed in the context of a discrete velocity model of the Boltzmann equation. Preliminary results on shock structure are consistent with the expectation that the shock should be much broader than the near discontinuity predicted by the Pullin scheme, yet narrower than the prediction of the Boltzmann equation. We discuss the extension of this essentially deterministic method to a stochastic particle method that, like DSMC, samples the distribution function rather than resolving it completely.
An inference method from multi-layered structure of biomedical data.

PubMed

Kim, Myungjun; Nam, Yonghyun; Shin, Hyunjung

2017-05-18

Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.
Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

PubMed

Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

2018-04-01

Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.
Adaptive firefly algorithm: parameter analysis and its application.

PubMed

Cheung, Ngaam J; Ding, Xue-Ming; Shen, Hong-Bin

2014-01-01

As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm - adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem - protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.
Adaptive Firefly Algorithm: Parameter Analysis and its Application

PubMed Central

Shen, Hong-Bin

2014-01-01

As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise. PMID:25397812
HSYMDOCK: a docking web server for predicting the structure of protein homo-oligomers with Cn or Dn symmetry.

PubMed

Yan, Yumeng; Tao, Huanyu; Huang, Sheng-You

2018-05-26

A major subclass of protein-protein interactions is formed by homo-oligomers with certain symmetry. Therefore, computational modeling of the symmetric protein complexes is important for understanding the molecular mechanism of related biological processes. Although several symmetric docking algorithms have been developed for Cn symmetry, few docking servers have been proposed for Dn symmetry. Here, we present HSYMDOCK, a web server of our hierarchical symmetric docking algorithm that supports both Cn and Dn symmetry. The HSYMDOCK server was extensively evaluated on three benchmarks of symmetric protein complexes, including the 20 CASP11-CAPRI30 homo-oligomer targets, the symmetric docking benchmark of 213 Cn targets and 35 Dn targets, and a nonredundant test set of 55 transmembrane proteins. It was shown that HSYMDOCK obtained a significantly better performance than other similar docking algorithms. The server supports both sequence and structure inputs for the monomer/subunit. Users have an option to provide the symmetry type of the complex, or the server can predict the symmetry type automatically. The docking process is fast and on average consumes 10∼20 min for a docking job. The HSYMDOCK web server is available at http://huanglab.phys.hust.edu.cn/hsymdock/.
Predicting the survival of diabetes using neural network

NASA Astrophysics Data System (ADS)

Mamuda, Mamman; Sathasivam, Saratha

2017-08-01

Data mining techniques at the present time are used in predicting diseases of health care industries. Neural Network is one among the prevailing method in data mining techniques of an intelligent field for predicting diseases in health care industries. This paper presents a study on the prediction of the survival of diabetes diseases using different learning algorithms from the supervised learning algorithms of neural network. Three learning algorithms are considered in this study: (i) The levenberg-marquardt learning algorithm (ii) The Bayesian regulation learning algorithm and (iii) The scaled conjugate gradient learning algorithm. The network is trained using the Pima Indian Diabetes Dataset with the help of MATLAB R2014(a) software. The performance of each algorithm is further discussed through regression analysis. The prediction accuracy of the best algorithm is further computed to validate the accurate prediction
Insight into the Structure of Amyloid Fibrils from the Analysis of Globular Proteins

PubMed Central

Trovato, Antonio; Chiti, Fabrizio; Maritan, Amos; Seno, Flavio

2006-01-01

The conversion from soluble states into cross-β fibrillar aggregates is a property shared by many different proteins and peptides and was hence conjectured to be a generic feature of polypeptide chains. Increasing evidence is now accumulating that such fibrillar assemblies are generally characterized by a parallel in-register alignment of β-strands contributed by distinct protein molecules. Here we assume a universal mechanism is responsible for β-structure formation and deduce sequence-specific interaction energies between pairs of protein fragments from a statistical analysis of the native folds of globular proteins. The derived fragment–fragment interaction was implemented within a novel algorithm, prediction of amyloid structure aggregation (PASTA), to investigate the role of sequence heterogeneity in driving specific aggregation into ordered self-propagating cross-β structures. The algorithm predicts that the parallel in-register arrangement of sequence portions that participate in the fibril cross-β core is favoured in most cases. However, the antiparallel arrangement is correctly discriminated when present in fibrils formed by short peptides. The predictions of the most aggregation-prone portions of initially unfolded polypeptide chains are also in excellent agreement with available experimental observations. These results corroborate the recent hypothesis that the amyloid structure is stabilised by the same physicochemical determinants as those operating in folded proteins. They also suggest that side chain–side chain interaction across neighbouring β-strands is a key determinant of amyloid fibril formation and of their self-propagating ability. PMID:17173479
Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?

PubMed

Skolnick, Jeffrey; Zhou, Hongyi

2017-04-20

Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
An effective medium inversion algorithm for gas hydrate quantification and its application to laboratory and borehole measurements of gas hydrate-bearing sediments

NASA Astrophysics Data System (ADS)

Chand, Shyam; Minshull, Tim A.; Priest, Jeff A.; Best, Angus I.; Clayton, Christopher R. I.; Waite, William F.

2006-08-01

The presence of gas hydrate in marine sediments alters their physical properties. In some circumstances, gas hydrate may cement sediment grains together and dramatically increase the seismic P- and S-wave velocities of the composite medium. Hydrate may also form a load-bearing structure within the sediment microstructure, but with different seismic wave attenuation characteristics, changing the attenuation behaviour of the composite. Here we introduce an inversion algorithm based on effective medium modelling to infer hydrate saturations from velocity and attenuation measurements on hydrate-bearing sediments. The velocity increase is modelled as extra binding developed by gas hydrate that strengthens the sediment microstructure. The attenuation increase is modelled through a difference in fluid flow properties caused by different permeabilities in the sediment and hydrate microstructures. We relate velocity and attenuation increases in hydrate-bearing sediments to their hydrate content, using an effective medium inversion algorithm based on the self-consistent approximation (SCA), differential effective medium (DEM) theory, and Biot and squirt flow mechanisms of fluid flow. The inversion algorithm is able to convert observations in compressional and shear wave velocities and attenuations to hydrate saturation in the sediment pore space. We applied our algorithm to a data set from the Mallik 2L-38 well, Mackenzie delta, Canada, and to data from laboratory measurements on gas-rich and water-saturated sand samples. Predictions using our algorithm match the borehole data and water-saturated laboratory data if the proportion of hydrate contributing to the load-bearing structure increases with hydrate saturation. The predictions match the gas-rich laboratory data if that proportion decreases with hydrate saturation. We attribute this difference to differences in hydrate formation mechanisms between the two environments.
An effective medium inversion algorithm for gas hydrate quantification and its application to laboratory and borehole measurements of gas hydrate-bearing sediments

USGS Publications Warehouse

Chand, S.; Minshull, T.A.; Priest, J.A.; Best, A.I.; Clayton, C.R.I.; Waite, W.F.

2006-01-01

The presence of gas hydrate in marine sediments alters their physical properties. In some circumstances, gas hydrate may cement sediment grains together and dramatically increase the seismic P- and S-wave velocities of the composite medium. Hydrate may also form a load-bearing structure within the sediment microstructure, but with different seismic wave attenuation characteristics, changing the attenuation behaviour of the composite. Here we introduce an inversion algorithm based on effective medium modelling to infer hydrate saturations from velocity and attenuation measurements on hydrate-bearing sediments. The velocity increase is modelled as extra binding developed by gas hydrate that strengthens the sediment microstructure. The attenuation increase is modelled through a difference in fluid flow properties caused by different permeabilities in the sediment and hydrate microstructures. We relate velocity and attenuation increases in hydrate-bearing sediments to their hydrate content, using an effective medium inversion algorithm based on the self-consistent approximation (SCA), differential effective medium (DEM) theory, and Biot and squirt flow mechanisms of fluid flow. The inversion algorithm is able to convert observations in compressional and shear wave velocities and attenuations to hydrate saturation in the sediment pore space. We applied our algorithm to a data set from the Mallik 2L–38 well, Mackenzie delta, Canada, and to data from laboratory measurements on gas-rich and water-saturated sand samples. Predictions using our algorithm match the borehole data and water-saturated laboratory data if the proportion of hydrate contributing to the load-bearing structure increases with hydrate saturation. The predictions match the gas-rich laboratory data if that proportion decreases with hydrate saturation. We attribute this difference to differences in hydrate formation mechanisms between the two environments.
A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks

PubMed Central

Chen, Jianhui; Tang, Lei; Liu, Jun; Ye, Jieping

2013-01-01

In this paper, we consider the problem of learning from multiple related tasks for improved generalization performance by extracting their shared structures. The alternating structure optimization (ASO) algorithm, which couples all tasks using a shared feature representation, has been successfully applied in various multitask learning problems. However, ASO is nonconvex and the alternating algorithm only finds a local solution. We first present an improved ASO formulation (iASO) for multitask learning based on a new regularizer. We then convert iASO, a nonconvex formulation, into a relaxed convex one (rASO). Interestingly, our theoretical analysis reveals that rASO finds a globally optimal solution to its nonconvex counterpart iASO under certain conditions. rASO can be equivalently reformulated as a semidefinite program (SDP), which is, however, not scalable to large datasets. We propose to employ the block coordinate descent (BCD) method and the accelerated projected gradient (APG) algorithm separately to find the globally optimal solution to rASO; we also develop efficient algorithms for solving the key subproblems involved in BCD and APG. The experiments on the Yahoo webpages datasets and the Drosophila gene expression pattern images datasets demonstrate the effectiveness and efficiency of the proposed algorithms and confirm our theoretical analysis. PMID:23520249
Structural dynamics payload loads estimates

NASA Technical Reports Server (NTRS)

Engels, R. C.

1982-01-01

Methods for the prediction of loads on large space structures are discussed. Existing approaches to the problem of loads calculation are surveyed. A full scale version of an alternate numerical integration technique to solve the response part of a load cycle is presented, and a set of short cut versions of the algorithm developed. The implementation of these techniques using the software package developed is discussed.

Improving threading algorithms for remote homology modeling by combining fragment and template comparisons

PubMed Central

Zhou, Hongyi; Skolnick, Jeffrey

2010-01-01

In this work, we develop a method called FTCOM for assessing the global quality of protein structural models for targets of medium and hard difficulty (remote homology) produced by structure prediction approaches such as threading or ab initio structure prediction. FTCOM requires the Cα coordinates of full length models and assesses model quality based on fragment comparison and a score derived from comparison of the model to top threading templates. On a set of 361 medium/hard targets, FTCOM was applied to and assessed for its ability to improve upon the results from the SP3, SPARKS, PROSPECTOR_3, and PRO-SP3-TASSER threading algorithms. The average TM-score improves by 5%–10% for the first selected model by the new method over models obtained by the original selection procedure in the respective threading methods. Moreover the number of foldable targets (TM-score ≥0.4) increases from least 7.6% for SP3 to 54% for SPARKS. Thus, FTCOM is a promising approach to template selection. PMID:20455261
Computational intelligence techniques for biological data mining: An overview

NASA Astrophysics Data System (ADS)

Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari

2014-10-01

Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Prediction of beta-turns and beta-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN).

PubMed

Kirschner, Andreas; Frishman, Dmitrij

2008-10-01

Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. http://webclu.bio.wzw.tum.de/predator-web/.
A novel structure-aware sparse learning algorithm for brain imaging genetics.

PubMed

Du, Lei; Jingwen, Yan; Kim, Sungeun; Risacher, Shannon L; Huang, Heng; Inlow, Mark; Moore, Jason H; Saykin, Andrew J; Shen, Li

2014-01-01

Brain imaging genetics is an emergent research field where the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs) is evaluated. Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. Most existing SCCA algorithms are designed using the soft threshold strategy, which assumes that the features in the data are independent from each other. This independence assumption usually does not hold in imaging genetic data, and thus inevitably limits the capability of yielding optimal solutions. We propose a novel structure-aware SCCA (denoted as S2CCA) algorithm to not only eliminate the independence assumption for the input data, but also incorporate group-like structure in the model. Empirical comparison with a widely used SCCA implementation, on both simulated and real imaging genetic data, demonstrated that S2CCA could yield improved prediction performance and biologically meaningful findings.
Identification of residue pairing in interacting β-strands from a predicted residue contact map.

PubMed

Mao, Wenzhi; Wang, Tong; Zhang, Wenxuan; Gong, Haipeng

2018-04-19

Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map. Our algorithm RDb 2 C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb 2 C. Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins. All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C .
A novel computer algorithm improves antibody epitope prediction using affinity-selected mimotopes: a case study using monoclonal antibodies against the West Nile virus E protein.

PubMed

Denisova, Galina F; Denisov, Dimitri A; Yeung, Jeffrey; Loeb, Mark B; Diamond, Michael S; Bramson, Jonathan L

2008-11-01

Understanding antibody function is often enhanced by knowledge of the specific binding epitope. Here, we describe a computer algorithm that permits epitope prediction based on a collection of random peptide epitopes (mimotopes) isolated by antibody affinity purification. We applied this methodology to the prediction of epitopes for five monoclonal antibodies against the West Nile virus (WNV) E protein, two of which exhibit therapeutic activity in vivo. This strategy was validated by comparison of our results with existing F(ab)-E protein crystal structures and mutational analysis by yeast surface display. We demonstrate that by combining the results of the mimotope method with our data from mutational analysis, epitopes could be predicted with greater certainty. The two methods displayed great complementarity as the mutational analysis facilitated epitope prediction when the results with the mimotope method were equivocal and the mimotope method revealed a broader number of residues within the epitope than the mutational analysis. Our results demonstrate that the combination of these two prediction strategies provides a robust platform for epitope characterization.
Parameterization of typhoon-induced ocean cooling using temperature equation and machine learning algorithms: an example of typhoon Soulik (2013)

NASA Astrophysics Data System (ADS)

Wei, Jun; Jiang, Guo-Qing; Liu, Xin

2017-09-01

This study proposed three algorithms that can potentially be used to provide sea surface temperature (SST) conditions for typhoon prediction models. Different from traditional data assimilation approaches, which provide prescribed initial/boundary conditions, our proposed algorithms aim to resolve a flow-dependent SST feedback between growing typhoons and oceans in the future time. Two of these algorithms are based on linear temperature equations (TE-based), and the other is based on an innovative technique involving machine learning (ML-based). The algorithms are then implemented into a Weather Research and Forecasting model for the simulation of typhoon to assess their effectiveness, and the results show significant improvement in simulated storm intensities by including ocean cooling feedback. The TE-based algorithm I considers wind-induced ocean vertical mixing and upwelling processes only, and thus obtained a synoptic and relatively smooth sea surface temperature cooling. The TE-based algorithm II incorporates not only typhoon winds but also ocean information, and thus resolves more cooling features. The ML-based algorithm is based on a neural network, consisting of multiple layers of input variables and neurons, and produces the best estimate of the cooling structure, in terms of its amplitude and position. Sensitivity analysis indicated that the typhoon-induced ocean cooling is a nonlinear process involving interactions of multiple atmospheric and oceanic variables. Therefore, with an appropriate selection of input variables and neuron sizes, the ML-based algorithm appears to be more efficient in prognosing the typhoon-induced ocean cooling and in predicting typhoon intensity than those algorithms based on linear regression methods.
Testing an earthquake prediction algorithm

USGS Publications Warehouse

Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.

1997-01-01

A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.
A continuously growing web-based interface structure databank

NASA Astrophysics Data System (ADS)

Erwin, N. A.; Wang, E. I.; Osysko, A.; Warner, D. H.

2012-07-01

The macroscopic properties of materials can be significantly influenced by the presence of microscopic interfaces. The complexity of these interfaces coupled with the vast configurational space in which they reside has been a long-standing obstacle to the advancement of true bottom-up material behavior predictions. In this vein, atomistic simulations have proven to be a valuable tool for investigating interface behavior. However, before atomistic simulations can be utilized to model interface behavior, meaningful interface atomic structures must be generated. The generation of structures has historically been carried out disjointly by individual research groups, and thus, has constituted an overlap in effort across the broad research community. To address this overlap and to lower the barrier for new researchers to explore interface modeling, we introduce a web-based interface structure databank (www.isdb.cee.cornell.edu) where users can search, download and share interface structures. The databank is intended to grow via two mechanisms: (1) interface structure donations from individual research groups and (2) an automated structure generation algorithm which continuously creates equilibrium interface structures. In this paper, we describe the databank, the automated interface generation algorithm, and compare a subset of the autonomously generated structures to structures currently available in the literature. To date, the automated generation algorithm has been directed toward aluminum grain boundary structures, which can be compared with experimentally measured population densities of aluminum polycrystals.
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure.

PubMed

Waldispühl, Jérôme; Ponty, Yann

2011-11-01

The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.
Network Community Detection based on the Physarum-inspired Computational Framework.

PubMed

Gao, Chao; Liang, Mingxin; Li, Xianghua; Zhang, Zili; Wang, Zhen; Zhou, Zhili

2016-12-13

Community detection is a crucial and essential problem in the structure analytics of complex networks, which can help us understand and predict the characteristics and functions of complex networks. Many methods, ranging from the optimization-based algorithms to the heuristic-based algorithms, have been proposed for solving such a problem. Due to the inherent complexity of identifying network structure, how to design an effective algorithm with a higher accuracy and a lower computational cost still remains an open problem. Inspired by the computational capability and positive feedback mechanism in the wake of foraging process of Physarum, which is a large amoeba-like cell consisting of a dendritic network of tube-like pseudopodia, a general Physarum-based computational framework for community detection is proposed in this paper. Based on the proposed framework, the inter-community edges can be identified from the intra-community edges in a network and the positive feedback of solving process in an algorithm can be further enhanced, which are used to improve the efficiency of original optimization-based and heuristic-based community detection algorithms, respectively. Some typical algorithms (e.g., genetic algorithm, ant colony optimization algorithm, and Markov clustering algorithm) and real-world datasets have been used to estimate the efficiency of our proposed computational framework. Experiments show that the algorithms optimized by Physarum-inspired computational framework perform better than the original ones, in terms of accuracy and computational cost. Moreover, a computational complexity analysis verifies the scalability of our framework.
Modeling side-chains using molecular dynamics improve recognition of binding region in CAPRI targets.

PubMed

Camacho, Carlos J

2005-08-01

The CAPRI-II experiment added an extra level of complexity to the problem of predicting protein-protein interactions by including 5 targets for which participants had to build or complete the 3-dimensional (3D) structure of either the receptor or ligand based on the structure of a close homolog. In this article, we describe how modeling key side-chains using molecular dynamics (MD) in explicit solvent improved the recognition of the binding region of a free energy- based computational docking method. In particular, we show that MD is able to predict with relatively high accuracy the rotamer conformation of the anchor side-chains important for molecular recognition as suggested by Rajamani et al. (Proc Natl Acad Sci USA 2004;101:11287-11292). As expected, the conformations are some of the most common rotamers for the given residue, while latch side-chains that undergo induced fit upon binding are forced into less common conformations. Using these models as starting conformations in conjunction with the rigid-body docking server ClusPro and the flexible docking algorithm SmoothDock, we produced valuable predictions for 6 of the 9 targets in CAPRI-II, missing only the 3 targets that underwent significant structural rearrangements upon binding. We also show that our free energy- based scoring function, consisting of the sum of van der Waals, Coulombic electrostatic with a distance-dependent dielectric, and desolvation free energy successfully discriminates the nativelike conformation of our submitted predictions. The latter emphasizes the critical role that thermodynamics plays on our methodology, and validates the generality of the algorithm to predict protein interactions.
Soft Computing Methods for Disulfide Connectivity Prediction.

PubMed

Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S

2015-01-01

The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.
Comparison of the accuracy of three algorithms in predicting accessory pathways among adult Wolff-Parkinson-White syndrome patients.

PubMed

Maden, Orhan; Balci, Kevser Gülcihan; Selcuk, Mehmet Timur; Balci, Mustafa Mücahit; Açar, Burak; Unal, Sefa; Kara, Meryem; Selcuk, Hatice

2015-12-01

The aim of this study was to investigate the accuracy of three algorithms in predicting accessory pathway locations in adult patients with Wolff-Parkinson-White syndrome in Turkish population. A total of 207 adult patients with Wolff-Parkinson-White syndrome were retrospectively analyzed. The most preexcited 12-lead electrocardiogram in sinus rhythm was used for analysis. Two investigators blinded to the patient data used three algorithms for prediction of accessory pathway location. Among all locations, 48.5% were left-sided, 44% were right-sided, and 7.5% were located in the midseptum or anteroseptum. When only exact locations were accepted as match, predictive accuracy for Chiang was 71.5%, 72.4% for d'Avila, and 71.5% for Arruda. The percentage of predictive accuracy of all algorithms did not differ between the algorithms (p = 1.000; p = 0.875; p = 0.885, respectively). The best algorithm for prediction of right-sided, left-sided, and anteroseptal and midseptal accessory pathways was Arruda (p < 0.001). Arruda was significantly better than d'Avila in predicting adjacent sites (p = 0.035) and the percent of the contralateral site prediction was higher with d'Avila than Arruda (p = 0.013). All algorithms were similar in predicting accessory pathway location and the predicted accuracy was lower than previously reported by their authors. However, according to the accessory pathway site, the algorithm designed by Arruda et al. showed better predictions than the other algorithms and using this algorithm may provide advantages before a planned ablation.
Prediction of novel synthetic pathways for the production of desired chemicals.

PubMed

Cho, Ayoun; Yun, Hongseok; Park, Jin Hwan; Lee, Sang Yup; Park, Sunwon

2010-03-28

There have been several methods developed for the prediction of synthetic metabolic pathways leading to the production of desired chemicals. In these approaches, novel pathways were predicted based on chemical structure changes, enzymatic information, and/or reaction mechanisms, but the approaches generating a huge number of predicted results are difficult to be applied to real experiments. Also, some of these methods focus on specific pathways, and thus are limited to expansion to the whole metabolism. In the present study, we propose a system framework employing a retrosynthesis model with a prioritization scoring algorithm. This new strategy allows deducing the novel promising pathways for the synthesis of a desired chemical together with information on enzymes involved based on structural changes and reaction mechanisms present in the system database. The prioritization scoring algorithm employing Tanimoto coefficient and group contribution method allows examination of structurally qualified pathways to recognize which pathway is more appropriate. In addition, new concepts of binding site covalence, estimation of pathway distance and organism specificity were taken into account to identify the best synthetic pathway. Parameters of these factors can be evolutionarily optimized when a newly proven synthetic pathway is registered. As the proofs of concept, the novel synthetic pathways for the production of isobutanol, 3-hydroxypropionate, and butyryl-CoA were predicted. The prediction shows a high reliability, in which experimentally verified synthetic pathways were listed within the top 0.089% of the identified pathway candidates. It is expected that the system framework developed in this study would be useful for the in silico design of novel metabolic pathways to be employed for the efficient production of chemicals, fuels and materials.
PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

PubMed Central

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

2008-01-01

A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945
D2N: Distance to the native.

PubMed

Mishra, Avinash; Rana, Prashant Singh; Mittal, Aditya; Jayaram, B

2014-10-01

Root-mean-square-deviation (RMSD), of computationally-derived protein structures from experimentally determined structures, is a critical index to assessing protein-structure-prediction-algorithms (PSPAs). The development of PSPAs to obtain 0Å RMSD from native structures is considered central to computational biology. However, till date it has been quite challenging to measure how far a predicted protein structure is from its native - in the absence of a known experimental/native structure. In this work, we report the development of a metric "D2N" (distance to the native) - that predicts the "RMSD" of any structure without actually knowing the native structure. By combining physico-chemical properties and known universalities in spatial organization of soluble proteins to develop D2N, we demonstrate the ability to predict the distance of a proposed structure to within ±1.5Ǻ error with a remarkable average accuracy of 93.6% for structures below 5Ǻ from the native. We believe that this work opens up a completely new avenue towards assigning reliable structures to whole proteomes even in the absence of experimentally determined native structures. The D2N tool is freely available at http://www.scfbio-iitd.res.in/software/d2n.jsp. Copyright © 2014 Elsevier B.V. All rights reserved.
Performance Analysis of Continuous Black-Box Optimization Algorithms via Footprints in Instance Space.

PubMed

Muñoz, Mario A; Smith-Miles, Kate A

2017-01-01

This article presents a method for the objective assessment of an algorithm's strengths and weaknesses. Instead of examining the performance of only one or more algorithms on a benchmark set, or generating custom problems that maximize the performance difference between two algorithms, our method quantifies both the nature of the test instances and the algorithm performance. Our aim is to gather information about possible phase transitions in performance, that is, the points in which a small change in problem structure produces algorithm failure. The method is based on the accurate estimation and characterization of the algorithm footprints, that is, the regions of instance space in which good or exceptional performance is expected from an algorithm. A footprint can be estimated for each algorithm and for the overall portfolio. Therefore, we select a set of features to generate a common instance space, which we validate by constructing a sufficiently accurate prediction model. We characterize the footprints by their area and density. Our method identifies complementary performance between algorithms, quantifies the common features of hard problems, and locates regions where a phase transition may lie.
NARMAX model identification of a palm oil biodiesel engine using multi-objective optimization differential evolution

NASA Astrophysics Data System (ADS)

Mansor, Zakwan; Zakaria, Mohd Zakimi; Nor, Azuwir Mohd; Saad, Mohd Sazli; Ahmad, Robiah; Jamaluddin, Hishamuddin

2017-09-01

This paper presents the black-box modelling of palm oil biodiesel engine (POB) using multi-objective optimization differential evolution (MOODE) algorithm. Two objective functions are considered in the algorithm for optimization; minimizing the number of term of a model structure and minimizing the mean square error between actual and predicted outputs. The mathematical model used in this study to represent the POB system is nonlinear auto-regressive moving average with exogenous input (NARMAX) model. Finally, model validity tests are applied in order to validate the possible models that was obtained from MOODE algorithm and lead to select an optimal model.
Efficient and optimized identification of generalized Maxwell viscoelastic relaxation spectra.

PubMed

Babaei, Behzad; Davarian, Ali; Pryse, Kenneth M; Elson, Elliot L; Genin, Guy M

2015-03-01

Viscoelastic relaxation spectra are essential for predicting and interpreting the mechanical responses of materials and structures. For biological tissues, these spectra must usually be estimated from viscoelastic relaxation tests. Interpreting viscoelastic relaxation tests is challenging because the inverse problem is expensive computationally. We present here an efficient algorithm that enables rapid identification of viscoelastic relaxation spectra. The algorithm was tested against trial data to characterize its robustness and identify its limitations and strengths. The algorithm was then applied to identify the viscoelastic response of reconstituted collagen, revealing an extensive distribution of viscoelastic time constants. Copyright © 2015 Elsevier Ltd. All rights reserved.

A Branch-and-Bound Approach for Tautomer Enumeration.

PubMed

Thalheim, Torsten; Wagner, Barbara; Kühne, Ralph; Middendorf, Martin; Schüürmann, Gerrit

2015-05-01

Knowledge about tautomer forms of a structure is important since, e.g., a property prediction for a molecule can yield to different results which depend on the individual tautomer. Tautomers are isomers that can be transformed to each other through chemical equilibrium reactions. In this paper the first exact Branch-and-Bound (B&B) algorithm to calculate tautomer structures is proposed. The algorithm is complete in the sense of tautomerism and generates all possible tautomers of a structure according to the tautomer definition, it is initialized with. To be efficient, the algorithm takes advantage of symmetric and formation properties. Some restrictions are used to enable an early pruning of some branches of the B&B tree. This is important, since a simple enumeration strategy would lead to number of candidate tautomers that is exponentially increasing with the number of hydrogen atoms and their attachment sites. The proposed implementation of the B&B algorithm covers the majority of the prototropic tautomer cases, but can be adapted to other kinds of tautomerism too. Furthermore, a computer processable definition of tautomerism is given in the form of the moving hydrogen atom problem. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sensitivity Analysis for Probabilistic Neural Network Structure Reduction.

PubMed

Kowalski, Piotr A; Kusy, Maciej

2018-05-01

In this paper, we propose the use of local sensitivity analysis (LSA) for the structure simplification of the probabilistic neural network (PNN). Three algorithms are introduced. The first algorithm applies LSA to the PNN input layer reduction by selecting significant features of input patterns. The second algorithm utilizes LSA to remove redundant pattern neurons of the network. The third algorithm combines the proposed two and constitutes the solution of how they can work together. PNN with a product kernel estimator is used, where each multiplicand computes a one-dimensional Cauchy function. Therefore, the smoothing parameter is separately calculated for each dimension by means of the plug-in method. The classification qualities of the reduced and full structure PNN are compared. Furthermore, we evaluate the performance of PNN, for which global sensitivity analysis (GSA) and the common reduction methods are applied, both in the input layer and the pattern layer. The models are tested on the classification problems of eight repository data sets. A 10-fold cross validation procedure is used to determine the prediction ability of the networks. Based on the obtained results, it is shown that the LSA can be used as an alternative PNN reduction approach.
IFACEwat: the interfacial water-implemented re-ranking algorithm to improve the discrimination of near native structures for protein rigid docking

PubMed Central

2014-01-01

Background Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. In this paper, we propose a new re-ranking technique using a new energy-based scoring function, namely IFACEwat - a combined Interface Atomic Contact Energy (IFACE) and water effect. The IFACEwat aims to further improve the discrimination of the near-native structures of the initial rigid docking algorithm ZDOCK3.0.2. Unlike other re-ranking techniques, the IFACEwat explicitly implements interfacial water into the protein interfaces to account for the water-mediated contacts during the protein interactions. Results Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. Conclusions With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms. PMID:25521441
Link Prediction in Evolving Networks Based on Popularity of Nodes.

PubMed

Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

2017-08-02

Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Toward a structure determination method for biomineral-associated protein using combined solid- state NMR and computational structure prediction.

PubMed

Masica, David L; Ash, Jason T; Ndao, Moise; Drobny, Gary P; Gray, Jeffrey J

2010-12-08

Protein-biomineral interactions are paramount to materials production in biology, including the mineral phase of hard tissue. Unfortunately, the structure of biomineral-associated proteins cannot be determined by X-ray crystallography or solution nuclear magnetic resonance (NMR). Here we report a method for determining the structure of biomineral-associated proteins. The method combines solid-state NMR (ssNMR) and ssNMR-biased computational structure prediction. In addition, the algorithm is able to identify lattice geometries most compatible with ssNMR constraints, representing a quantitative, novel method for investigating crystal-face binding specificity. We use this method to determine most of the structure of human salivary statherin interacting with the mineral phase of tooth enamel. Computation and experiment converge on an ensemble of related structures and identify preferential binding at three crystal surfaces. The work represents a significant advance toward determining structure of biomineral-adsorbed protein using experimentally biased structure prediction. This method is generally applicable to proteins that can be chemically synthesized. Copyright © 2010 Elsevier Ltd. All rights reserved.
Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms

PubMed Central

Li, Jun; Riehle, Michelle M; Zhang, Yan; Xu, Jiannong; Oduol, Frederick; Gomez, Shawn M; Eiglmeier, Karin; Ueberheide, Beatrix M; Shabanowitz, Jeffrey; Hunt, Donald F; Ribeiro, José MC; Vernick, Kenneth D

2006-01-01

Background Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. Results We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. Conclusion Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms. PMID:16569258
Predicting nucleic acid binding interfaces from structural models of proteins

PubMed Central

Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

2011-01-01

The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767
Predicting Human Preferences Using the Block Structure of Complex Social Networks

PubMed Central

Guimerà, Roger; Llorente, Alejandro; Moro, Esteban; Sales-Pardo, Marta

2012-01-01

With ever-increasing available data, predicting individuals' preferences and helping them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a “new” computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups. PMID:22984533
Prediction of chemical biodegradability using support vector classifier optimized with differential evolution.

PubMed

Cao, Qi; Leung, K M

2014-09-22

Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
The RNA Newton polytope and learnability of energy parameters.

PubMed

Forouzmand, Elmirasadat; Chitsaz, Hamidreza

2013-07-01

Computational RNA structure prediction is a mature important problem that has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two score years of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that? The first step toward high-accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this article, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach toward computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst case complexity of our algorithm is exponential in the number of features. However, dimensionality reduction techniques can provide approximate solutions to avoid the curse of dimensionality. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for ~ 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed 3D feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than two, and for only 11 sequences, this distance is larger than five. The source code is available on http://compbio.cs.wayne.edu/software/rna-newton-polytope.
Acoustic emission localization based on FBG sensing network and SVR algorithm

NASA Astrophysics Data System (ADS)

Sai, Yaozhang; Zhao, Xiuxia; Hou, Dianli; Jiang, Mingshun

2017-03-01

In practical application, carbon fiber reinforced plastics (CFRP) structures are easy to appear all sorts of invisible damages. So the damages should be timely located and detected for the safety of CFPR structures. In this paper, an acoustic emission (AE) localization system based on fiber Bragg grating (FBG) sensing network and support vector regression (SVR) is proposed for damage localization. AE signals, which are caused by damage, are acquired by high speed FBG interrogation. According to the Shannon wavelet transform, time differences between AE signals are extracted for localization algorithm based on SVR. According to the SVR model, the coordinate of AE source can be accurately predicted without wave velocity. The FBG system and localization algorithm are verified on a 500 mm×500 mm×2 mm CFRP plate. The experimental results show that the average error of localization system is 2.8 mm and the training time is 0.07 s.
Prediction of distribution coefficient from structure. 1. Estimation method.

PubMed

Csizmadia, F; Tsantili-Kakoulidou, A; Panderi, I; Darvas, F

1997-07-01

A method has been developed for the estimation of the distribution coefficient (D), which considers the microspecies of a compound. D is calculated from the microscopic dissociation constants (microconstants), the partition coefficients of the microspecies, and the counterion concentration. A general equation for the calculation of D at a given pH is presented. The microconstants are calculated from the structure using Hammett and Taft equations. The partition coefficients of the ionic microspecies are predicted by empirical equations using the dissociation constants and the partition coefficient of the uncharged species, which are estimated from the structure by a Linear Free Energy Relationship method. The algorithm is implemented in a program module called PrologD.
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure.

PubMed

Lustgarten, Jonathan Lyle; Balasubramanian, Jeya Balaji; Visweswaran, Shyam; Gopalakrishnan, Vanathi

2017-03-01

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.
Powder diffraction and crystal structure prediction identify four new coumarin polymorphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shtukenberg, Alexander G.; Zhu, Qiang; Carter, Damien J.

Coumarin, a simple, commodity chemical isolated from beans in 1820, has, to date, only yielded one solid state structure. Here, we report a rich polymorphism of coumarin grown from the melt. Four new metastable forms were identified and their crystal structures were solved using a combination of computational crystal structure prediction algorithms and X-ray powder diffraction. With five crystal structures, coumarin has become one of the few rigid molecules showing extensive polymorphism at ambient conditions. We demonstrate the crucial role of advanced electronic structure calculations including many-body dispersion effects for accurate ranking of the stability of coumarin polymorphs and themore » need to account for anharmonic vibrational contributions to their free energy. As such, coumarin is a model system for studying weak intermolecular interactions, crystallization mechanisms, and kinetic effects.« less
Powder diffraction and crystal structure prediction identify four new coumarin polymorphs

DOE PAGES

Shtukenberg, Alexander G.; Zhu, Qiang; Carter, Damien J.; ...

2017-05-15

Coumarin, a simple, commodity chemical isolated from beans in 1820, has, to date, only yielded one solid state structure. Here, we report a rich polymorphism of coumarin grown from the melt. Four new metastable forms were identified and their crystal structures were solved using a combination of computational crystal structure prediction algorithms and X-ray powder diffraction. With five crystal structures, coumarin has become one of the few rigid molecules showing extensive polymorphism at ambient conditions. We demonstrate the crucial role of advanced electronic structure calculations including many-body dispersion effects for accurate ranking of the stability of coumarin polymorphs and themore » need to account for anharmonic vibrational contributions to their free energy. As such, coumarin is a model system for studying weak intermolecular interactions, crystallization mechanisms, and kinetic effects.« less
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
STITCHER: Dynamic assembly of likely amyloid and prion β-structures from secondary structure predictions

PubMed Central

Bryan, Allen W; O’Donnell, Charles W; Menke, Matthew; Cowen, Lenore J; Lindquist, Susan; Berger, Bonnie

2012-01-01

The supersecondary structure of amyloids and prions, proteins of intense clinical and biological interest, are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Previous work has demonstrated that probability-based prediction of discrete β-strand pairs can offer insight into these structures. Here, we devise a system of energetic rules that can be used to dynamically assemble these discrete β-strand pairs into complete amyloid β-structures. The STITCHER algorithm progressively ‘stitches’ strand-pairs into full β-sheets based on a novel free-energy model, incorporating experimentally observed amino-acid side-chain stacking contributions, entropic estimates, and steric restrictions for amyloidal parallel β-sheet construction. A dynamic program computes the top 50 structures and returns both the highest scoring structure and a consensus structure taken by polling this list for common discrete elements. Putative structural heterogeneity can be inferred from sequence regions that compose poorly. Predictions show agreement with experimental models of Alzheimer’s amyloid beta peptide and the Podospora anserina Het-s prion. Predictions of the HET-s homolog HET-S also reflect experimental observations of poor amyloid formation. We put forward predicted structures for the yeast prion Sup35, suggesting N-terminal structural stability enabled by tyrosine ladders, and C-terminal heterogeneity. Predictions for the Rnq1 prion and alpha-synuclein are also given, identifying a similar mix of homogenous and heterogeneous secondary structure elements. STITCHER provides novel insight into the energetic basis of amyloid structure, provides accurate structure predictions, and can help guide future experimental studies. Proteins 2012. © 2011 Wiley Periodicals, Inc. PMID:22095906
STITCHER: Dynamic assembly of likely amyloid and prion β-structures from secondary structure predictions.

PubMed

Bryan, Allen W; O'Donnell, Charles W; Menke, Matthew; Cowen, Lenore J; Lindquist, Susan; Berger, Bonnie

2012-02-01

The supersecondary structure of amyloids and prions, proteins of intense clinical and biological interest, are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Previous work has demonstrated that probability-based prediction of discrete β-strand pairs can offer insight into these structures. Here, we devise a system of energetic rules that can be used to dynamically assemble these discrete β-strand pairs into complete amyloid β-structures. The STITCHER algorithm progressively 'stitches' strand-pairs into full β-sheets based on a novel free-energy model, incorporating experimentally observed amino-acid side-chain stacking contributions, entropic estimates, and steric restrictions for amyloidal parallel β-sheet construction. A dynamic program computes the top 50 structures and returns both the highest scoring structure and a consensus structure taken by polling this list for common discrete elements. Putative structural heterogeneity can be inferred from sequence regions that compose poorly. Predictions show agreement with experimental models of Alzheimer's amyloid beta peptide and the Podospora anserina Het-s prion. Predictions of the HET-s homolog HET-S also reflect experimental observations of poor amyloid formation. We put forward predicted structures for the yeast prion Sup35, suggesting N-terminal structural stability enabled by tyrosine ladders, and C-terminal heterogeneity. Predictions for the Rnq1 prion and alpha-synuclein are also given, identifying a similar mix of homogenous and heterogeneous secondary structure elements. STITCHER provides novel insight into the energetic basis of amyloid structure, provides accurate structure predictions, and can help guide future experimental studies. Copyright © 2011 Wiley Periodicals, Inc.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.

PubMed

Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra

2011-01-01

Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.

PubMed

Mei, Gang; Xu, Nengxiong; Xu, Liangliang

2016-01-01

This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.

bpRNA: large-scale automated annotation and analysis of RNA secondary structure.

PubMed

Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David

2018-05-09

While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Predictive power of theoretical modelling of the nuclear mean field: examples of improving predictive capacities

NASA Astrophysics Data System (ADS)

Dedes, I.; Dudek, J.

2018-03-01

We examine the effects of the parametric correlations on the predictive capacities of the theoretical modelling keeping in mind the nuclear structure applications. The main purpose of this work is to illustrate the method of establishing the presence and determining the form of parametric correlations within a model as well as an algorithm of elimination by substitution (see text) of parametric correlations. We examine the effects of the elimination of the parametric correlations on the stabilisation of the model predictions further and further away from the fitting zone. It follows that the choice of the physics case and the selection of the associated model are of secondary importance in this case. Under these circumstances we give priority to the relative simplicity of the underlying mathematical algorithm, provided the model is realistic. Following such criteria, we focus specifically on an important but relatively simple case of doubly magic spherical nuclei. To profit from the algorithmic simplicity we chose working with the phenomenological spherically symmetric Woods–Saxon mean-field. We employ two variants of the underlying Hamiltonian, the traditional one involving both the central and the spin orbit potential in the Woods–Saxon form and the more advanced version with the self-consistent density-dependent spin–orbit interaction. We compare the effects of eliminating of various types of correlations and discuss the improvement of the quality of predictions (‘predictive power’) under realistic parameter adjustment conditions.
Automated 3D structure composition for large RNAs

PubMed Central

Popenda, Mariusz; Szachniuk, Marta; Antczak, Maciej; Purzycka, Katarzyna J.; Lukasiak, Piotr; Bartol, Natalia; Blazewicz, Jacek; Adamiak, Ryszard W.

2012-01-01

Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues. PMID:22539264
Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs.

PubMed

Liu, Mei; Wu, Yonghui; Chen, Yukun; Sun, Jingchun; Zhao, Zhongming; Chen, Xue-wen; Matheny, Michael Edwin; Xu, Hua

2012-06-01

Adverse drug reaction (ADR) is one of the major causes of failure in drug development. Severe ADRs that go undetected until the post-marketing phase of a drug often lead to patient morbidity. Accurate prediction of potential ADRs is required in the entire life cycle of a drug, including early stages of drug design, different phases of clinical trials, and post-marketing surveillance. Many studies have utilized either chemical structures or molecular pathways of the drugs to predict ADRs. Here, the authors propose a machine-learning-based approach for ADR prediction by integrating the phenotypic characteristics of a drug, including indications and other known ADRs, with the drug's chemical structures and biological properties, including protein targets and pathway information. A large-scale study was conducted to predict 1385 known ADRs of 832 approved drugs, and five machine-learning algorithms for this task were compared. This evaluation, based on a fivefold cross-validation, showed that the support vector machine algorithm outperformed the others. Of the three types of information, phenotypic data were the most informative for ADR prediction. When biological and phenotypic features were added to the baseline chemical information, the ADR prediction model achieved significant improvements in area under the curve (from 0.9054 to 0.9524), precision (from 43.37% to 66.17%), and recall (from 49.25% to 63.06%). Most importantly, the proposed model successfully predicted the ADRs associated with withdrawal of rofecoxib and cerivastatin. The results suggest that phenotypic information on drugs is valuable for ADR prediction. Moreover, they demonstrate that different models that combine chemical, biological, or phenotypic information can be built from approved drugs, and they have the potential to detect clinically important ADRs in both preclinical and post-marketing phases.
Actinic imaging and evaluation of phase structures on EUV lithography masks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mochi, Iacopo; Goldberg, Kenneth; Huh, Sungmin

2010-09-28

The authors describe the implementation of a phase-retrieval algorithm to reconstruct phase and complex amplitude of structures on EUV lithography masks. Many native defects commonly found on EUV reticles are difficult to detect and review accurately because they have a strong phase component. Understanding the complex amplitude of mask features is essential for predictive modeling of defect printability and defect repair. Besides printing in a stepper, the most accurate way to characterize such defects is with actinic inspection, performed at the design, EUV wavelength. Phase defect and phase structures show a distinct through-focus behavior that enables qualitative evaluation of themore » object phase from two or more high-resolution intensity measurements. For the first time, phase of structures and defects on EUV masks were quantitatively reconstructed based on aerial image measurements, using a modified version of a phase-retrieval algorithm developed to test optical phase shifting reticles.« less
Comparison of RF spectrum prediction methods for dynamic spectrum access

NASA Astrophysics Data System (ADS)

Kovarskiy, Jacob A.; Martone, Anthony F.; Gallagher, Kyle A.; Sherbondy, Kelly D.; Narayanan, Ram M.

2017-05-01

Dynamic spectrum access (DSA) refers to the adaptive utilization of today's busy electromagnetic spectrum. Cognitive radio/radar technologies require DSA to intelligently transmit and receive information in changing environments. Predicting radio frequency (RF) activity reduces sensing time and energy consumption for identifying usable spectrum. Typical spectrum prediction methods involve modeling spectral statistics with Hidden Markov Models (HMM) or various neural network structures. HMMs describe the time-varying state probabilities of Markov processes as a dynamic Bayesian network. Neural Networks model biological brain neuron connections to perform a wide range of complex and often non-linear computations. This work compares HMM, Multilayer Perceptron (MLP), and Recurrent Neural Network (RNN) algorithms and their ability to perform RF channel state prediction. Monte Carlo simulations on both measured and simulated spectrum data evaluate the performance of these algorithms. Generalizing spectrum occupancy as an alternating renewal process allows Poisson random variables to generate simulated data while energy detection determines the occupancy state of measured RF spectrum data for testing. The results suggest that neural networks achieve better prediction accuracy and prove more adaptable to changing spectral statistics than HMMs given sufficient training data.
Critical analysis of dual-chamber implantable cardioverter-defibrillator arrhythmia detection : results and technical considerations.

PubMed

Wilkoff, B L; Kühlkamp, V; Volosin, K; Ellenbogen, K; Waldecker, B; Kacet, S; Gillberg, J M; DeSouza, C M

2001-01-23

One of the perceived benefits of dual-chamber implantable cardioverter-defibrillators (ICDs) is the reduction in inappropriate therapy due to new detection algorithms. It was the purpose of the present investigation to propose methods to minimize bias during such comparisons and to report the arrhythmia detection clinical results of the PR Logic dual-chamber detection algorithm in the GEM DR ICD in the context of these methods. Between November 1997 and October 1998, 933 patients received the GEM DR ICD in this prospective multicenter study. A total of 4856 sustained arrhythmia episodes (n=311) with stored electrogram and marker channel were classified by the investigators; 3488 episodes (n=232) were ventricular tachycardia (VT)/ventricular fibrillation (VF), and 1368 episodes (n=149) were supraventricular tachycardia (SVT). The overall detection results were corrected for multiple episodes within a patient with the generalized estimating equations (GEE) method with an exchangeable correlation structure between episodes. The relative sensitivity for detection of sustained VT and/or VF was 100.0% (3488 of 3488, n=232; 95% CI 98.3% to 100%), the VT/VF positive predictivity was 88.4% uncorrected (3488 of 3945, n=278) and 78.1% corrected (95% CI 73.3% to 82.3%) with the GEE method, and the SVT positive predictivity was 100.0% (911 of 911, n=101; 95% CI 96% to 100%). A structured approach to analysis limits the bias inherent in the evaluation of tachycardia discrimination algorithms through the use of relative VT/VF sensitivity, VT/VF positive predictivity, and SVT positive predictivity along with corrections for multiple tachycardia episodes in a single patient.
GAtor: A First-Principles Genetic Algorithm for Molecular Crystal Structure Prediction.

PubMed

Curtis, Farren; Li, Xiayue; Rose, Timothy; Vázquez-Mayagoitia, Álvaro; Bhattacharya, Saswata; Ghiringhelli, Luca M; Marom, Noa

2018-04-10

We present the implementation of GAtor, a massively parallel, first-principles genetic algorithm (GA) for molecular crystal structure prediction. GAtor is written in Python and currently interfaces with the FHI-aims code to perform local optimizations and energy evaluations using dispersion-inclusive density functional theory (DFT). GAtor offers a variety of fitness evaluation, selection, crossover, and mutation schemes. Breeding operators designed specifically for molecular crystals provide a balance between exploration and exploitation. Evolutionary niching is implemented in GAtor by using machine learning to cluster the dynamically updated population by structural similarity and then employing a cluster-based fitness function. Evolutionary niching promotes uniform sampling of the potential energy surface by evolving several subpopulations, which helps overcome initial pool biases and selection biases (genetic drift). The various settings offered by GAtor increase the likelihood of locating numerous low-energy minima, including those located in disconnected, hard to reach regions of the potential energy landscape. The best structures generated are re-relaxed and re-ranked using a hierarchy of increasingly accurate DFT functionals and dispersion methods. GAtor is applied to a chemically diverse set of four past blind test targets, characterized by different types of intermolecular interactions. The experimentally observed structures and other low-energy structures are found for all four targets. In particular, for Target II, 5-cyano-3-hydroxythiophene, the top ranked putative crystal structure is a Z' = 2 structure with P1̅ symmetry and a scaffold packing motif, which has not been reported previously.
Mixed time integration methods for transient thermal analysis of structures

NASA Technical Reports Server (NTRS)

Liu, W. K.

1982-01-01

The computational methods used to predict and optimize the thermal structural behavior of aerospace vehicle structures are reviewed. In general, two classes of algorithms, implicit and explicit, are used in transient thermal analysis of structures. Each of these two methods has its own merits. Due to the different time scales of the mechanical and thermal responses, the selection of a time integration method can be a different yet critical factor in the efficient solution of such problems. Therefore mixed time integration methods for transient thermal analysis of structures are being developed. The computer implementation aspects and numerical evaluation of these mixed time implicit-explicit algorithms in thermal analysis of structures are presented. A computationally useful method of estimating the critical time step for linear quadrilateral element is also given. Numerical tests confirm the stability criterion and accuracy characteristics of the methods. The superiority of these mixed time methods to the fully implicit method or the fully explicit method is also demonstrated.
Mixed time integration methods for transient thermal analysis of structures

NASA Technical Reports Server (NTRS)

Liu, W. K.

1983-01-01

The computational methods used to predict and optimize the thermal-structural behavior of aerospace vehicle structures are reviewed. In general, two classes of algorithms, implicit and explicit, are used in transient thermal analysis of structures. Each of these two methods has its own merits. Due to the different time scales of the mechanical and thermal responses, the selection of a time integration method can be a difficult yet critical factor in the efficient solution of such problems. Therefore mixed time integration methods for transient thermal analysis of structures are being developed. The computer implementation aspects and numerical evaluation of these mixed time implicit-explicit algorithms in thermal analysis of structures are presented. A computationally-useful method of estimating the critical time step for linear quadrilateral element is also given. Numerical tests confirm the stability criterion and accuracy characteristics of the methods. The superiority of these mixed time methods to the fully implicit method or the fully explicit method is also demonstrated.
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

PubMed Central

Xu, Liangliang; Xu, Nengxiong

2017-01-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available. PMID:28989754
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit.

PubMed

Mei, Gang; Xu, Liangliang; Xu, Nengxiong

2017-09-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Advanced methods in NDE using machine learning approaches

NASA Astrophysics Data System (ADS)

Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

2018-04-01

Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability prediction based on big data becomes possible, even if components are used in different versions or configurations. This is the promise behind German Industry 4.0.
Family-Based Benchmarking of Copy Number Variation Detection Software.

PubMed

Nutsua, Marcel Elie; Fischer, Annegret; Nebel, Almut; Hofmann, Sylvia; Schreiber, Stefan; Krawczak, Michael; Nothnagel, Michael

2015-01-01

The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.
Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions

PubMed Central

Tataru, Paula; Sand, Andreas; Hobolth, Asger; Mailund, Thomas; Pedersen, Christian N. S.

2013-01-01

Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. PMID:24833225
Sensitivity of Latent Heating Profiles to Environmental Conditions: Implications for TRMM and Climate Research

NASA Technical Reports Server (NTRS)

Shepherd, J. Marshall; Einaudi, Franco (Technical Monitor)

2000-01-01

The Tropical Rainfall Measuring Mission (TRMM) as a part of NASA's Earth System Enterprise is the first mission dedicated to measuring tropical rainfall through microwave and visible sensors, and includes the first spaceborne rain radar. Tropical rainfall comprises two-thirds of global rainfall. It is also the primary distributor of heat through the atmosphere's circulation. It is this circulation that defines Earth's weather and climate. Understanding rainfall and its variability is crucial to understanding and predicting global climate change. Weather and climate models need an accurate assessment of the latent heating released as tropical rainfall occurs. Currently, cloud model-based algorithms are used to derive latent heating based on rainfall structure. Ultimately, these algorithms can be applied to actual data from TRMM. This study investigates key underlying assumptions used in developing the latent heating algorithms. For example, the standard algorithm is highly dependent on a system's rainfall amount and structure. It also depends on an a priori database of model-derived latent heating profiles based on the aforementioned rainfall characteristics. Unanswered questions remain concerning the sensitivity of latent heating profiles to environmental conditions (both thermodynamic and kinematic), regionality, and seasonality. This study investigates and quantifies such sensitivities and seeks to determine the optimal latent heating profile database based on the results. Ultimately, the study seeks to produce an optimized latent heating algorithm based not only on rainfall structure but also hydrometeor profiles.
RBind: computational network method to predict RNA binding sites.

PubMed

Wang, Kaili; Jian, Yiren; Wang, Huiwen; Zeng, Chen; Zhao, Yunjie

2018-04-26

Non-coding RNA molecules play essential roles by interacting with other molecules to perform various biological functions. However, it is difficult to determine RNA structures due to their flexibility. At present, the number of experimentally solved RNA-ligand and RNA-protein structures is still insufficient. Therefore, binding sites prediction of non-coding RNA is required to understand their functions. Current RNA binding site prediction algorithms produce many false positive nucleotides that are distance away from the binding sites. Here, we present a network approach, RBind, to predict the RNA binding sites. We benchmarked RBind in RNA-ligand and RNA-protein datasets. The average accuracy of 0.82 in RNA-ligand and 0.63 in RNA-protein testing showed that this network strategy has a reliable accuracy for binding sites prediction. The codes and datasets are available at https://zhaolab.com.cn/RBind. yjzhaowh@mail.ccnu.edu.cn. Supplementary data are available at Bioinformatics online.
A new neuro-fuzzy training algorithm for identifying dynamic characteristics of smart dampers

NASA Astrophysics Data System (ADS)

Dzung Nguyen, Sy; Choi, Seung-Bok

2012-08-01

This paper proposes a new algorithm, named establishing neuro-fuzzy system (ENFS), to identify dynamic characteristics of smart dampers such as magnetorheological (MR) and electrorheological (ER) dampers. In the ENFS, data clustering is performed based on the proposed algorithm named partitioning data space (PDS). Firstly, the PDS builds data clusters in joint input-output data space with appropriate constraints. The role of these constraints is to create reasonable data distribution in clusters. The ENFS then uses these clusters to perform the following tasks. Firstly, the fuzzy sets expressing characteristics of data clusters are established. The structure of the fuzzy sets is adjusted to be suitable for features of the data set. Secondly, an appropriate structure of neuro-fuzzy (NF) expressed by an optimal number of labeled data clusters and the fuzzy-set groups is determined. After the ENFS is introduced, its effectiveness is evaluated by a prediction-error-comparative work between the proposed method and some other methods in identifying numerical data sets such as ‘daily data of stock A’, or in identifying a function. The ENFS is then applied to identify damping force characteristics of the smart dampers. In order to evaluate the effectiveness of the ENFS in identifying the damping forces of the smart dampers, the prediction errors are presented by comparing with experimental results.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

PubMed Central

Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

2007-01-01

Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Model-Based Fatigue Prognosis of Fiber-Reinforced Laminates Exhibiting Concurrent Damage Mechanisms

NASA Technical Reports Server (NTRS)

Corbetta, M.; Sbarufatti, C.; Saxena, A.; Giglio, M.; Goebel, K.

2016-01-01

Prognostics of large composite structures is a topic of increasing interest in the field of structural health monitoring for aerospace, civil, and mechanical systems. Along with recent advancements in real-time structural health data acquisition and processing for damage detection and characterization, model-based stochastic methods for life prediction are showing promising results in the literature. Among various model-based approaches, particle-filtering algorithms are particularly capable in coping with uncertainties associated with the process. These include uncertainties about information on the damage extent and the inherent uncertainties of the damage propagation process. Some efforts have shown successful applications of particle filtering-based frameworks for predicting the matrix crack evolution and structural stiffness degradation caused by repetitive fatigue loads. Effects of other damage modes such as delamination, however, are not incorporated in these works. It is well established that delamination and matrix cracks not only co-exist in most laminate structures during the fatigue degradation process but also affect each other's progression. Furthermore, delamination significantly alters the stress-state in the laminates and accelerates the material degradation leading to catastrophic failure. Therefore, the work presented herein proposes a particle filtering-based framework for predicting a structure's remaining useful life with consideration of multiple co-existing damage-mechanisms. The framework uses an energy-based model from the composite modeling literature. The multiple damage-mode model has been shown to suitably estimate the energy release rate of cross-ply laminates as affected by matrix cracks and delamination modes. The model is also able to estimate the reduction in stiffness of the damaged laminate. This information is then used in the algorithms for life prediction capabilities. First, a brief summary of the energy-based damage model is provided. Then, the paper describes how the model is embedded within the prognostic framework and how the prognostics performance is assessed using observations from run-to-failure experiments

Stargate GTM: Bridging Descriptor and Activity Spaces.

PubMed

Gaspar, Héléna A; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre

2015-11-23

Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.
Virtual screening on an α-helix to β-strand switchable region of the FGFR2 extracellular domain revealed positive and negative modulators.

PubMed

Diaz, Constantino; Corentin, Herbert; Thierry, Vermat; Chantal, Alcouffe; Tanguy, Bozec; David, Sibrac; Jean-Marc, Herbert; Pascual, Ferrara; Françoise, Bono; Edgardo, Ferran

2014-11-01

The secondary structure of some protein segments may vary between α-helix and β-strand. To predict these switchable segments, we have developed an algorithm, Switch-P, based solely on the protein sequence. This algorithm was used on the extracellular parts of FGF receptors. For FGFR2, it predicted that β4 and β5 strands of the third Ig-like domain were highly switchable. These two strands possess a high number of somatic mutations associated with cancer. Analysis of PDB structures of FGF receptors confirmed the switchability prediction for β5. We thus evaluated if compound-driven α-helix/β-strand switching of β5 could modulate FGFR2 signaling. We performed the virtual screening of a library containing 1.4 million of chemical compounds with two models of the third Ig-like domain of FGFR2 showing different secondary structures for β5, and we selected 32 compounds. Experimental testing using proliferation assays with FGF7-stimulated SNU-16 cells and a FGFR2-dependent Erk1/2 phosphorylation assay with FGFR2-transfected L6 cells, revealed activators and inhibitors of FGFR2. Our method for the identification of switchable proteinic regions, associated with our virtual screening approach, provides an opportunity to discover new generation of drugs with under-explored mechanism of action. © 2014 Wiley Periodicals, Inc.
IFACEwat: the interfacial water-implemented re-ranking algorithm to improve the discrimination of near native structures for protein rigid docking.

PubMed

Su, Chinh; Nguyen, Thuy-Diem; Zheng, Jie; Kwoh, Chee-Keong

2014-01-01

Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms.
Analysis of Air Traffic Track Data with the AutoBayes Synthesis System

NASA Technical Reports Server (NTRS)

Schumann, Johann Martin Philip; Cate, Karen; Lee, Alan G.

2010-01-01

The Next Generation Air Traffic System (NGATS) is aiming to provide substantial computer support for the air traffic controllers. Algorithms for the accurate prediction of aircraft movements are of central importance for such software systems but trajectory prediction has to work reliably in the presence of unknown parameters and uncertainties. We are using the AutoBayes program synthesis system to generate customized data analysis algorithms that process large sets of aircraft radar track data in order to estimate parameters and uncertainties. In this paper, we present, how the tasks of finding structure in track data, estimation of important parameters in climb trajectories, and the detection of continuous descent approaches can be accomplished with compact task-specific AutoBayes specifications. We present an overview of the AutoBayes architecture and describe, how its schema-based approach generates customized analysis algorithms, documented C/C++ code, and detailed mathematical derivations. Results of experiments with actual air traffic control data are discussed.
A high order accurate finite element algorithm for high Reynolds number flow prediction

NASA Technical Reports Server (NTRS)

Baker, A. J.

1978-01-01

A Galerkin-weighted residuals formulation is employed to establish an implicit finite element solution algorithm for generally nonlinear initial-boundary value problems. Solution accuracy, and convergence rate with discretization refinement, are quantized in several error norms, by a systematic study of numerical solutions to several nonlinear parabolic and a hyperbolic partial differential equation characteristic of the equations governing fluid flows. Solutions are generated using selective linear, quadratic and cubic basis functions. Richardson extrapolation is employed to generate a higher-order accurate solution to facilitate isolation of truncation error in all norms. Extension of the mathematical theory underlying accuracy and convergence concepts for linear elliptic equations is predicted for equations characteristic of laminar and turbulent fluid flows at nonmodest Reynolds number. The nondiagonal initial-value matrix structure introduced by the finite element theory is determined intrinsic to improved solution accuracy and convergence. A factored Jacobian iteration algorithm is derived and evaluated to yield a consequential reduction in both computer storage and execution CPU requirements while retaining solution accuracy.
Prediction of missing links and reconstruction of complex networks

NASA Astrophysics Data System (ADS)

Zhang, Cheng-Jun; Zeng, An

2016-04-01

Predicting missing links in complex networks is of great significance from both theoretical and practical point of view, which not only helps us understand the evolution of real systems but also relates to many applications in social, biological and online systems. In this paper, we study the features of different simple link prediction methods, revealing that they may lead to the distortion of networks’ structural and dynamical properties. Moreover, we find that high prediction accuracy is not definitely corresponding to a high performance in preserving the network properties when using link prediction methods to reconstruct networks. Our work highlights the importance of considering the feedback effect of the link prediction methods on network properties when designing the algorithms.
Ab initio crystal structure prediction of magnesium (poly)sulfides and calculation of their NMR parameters.

PubMed

Mali, Gregor

2017-03-01

Ab initio prediction of sensible crystal structures can be regarded as a crucial task in the quickly-developing methodology of NMR crystallography. In this contribution, an evolutionary algorithm was used for the prediction of magnesium (poly)sulfide crystal structures with various compositions. The employed approach successfully identified all three experimentally detected forms of MgS, i.e. the stable rocksalt form and the metastable wurtzite and zincblende forms. Among magnesium polysulfides with a higher content of sulfur, the most probable structure with the lowest formation energy was found to be MgS 2 , exhibiting a modified rocksalt structure, in which S 2- anions were replaced by S 2 2- dianions. Magnesium polysulfides with even larger fractions of sulfur were not predicted to be stable. For the lowest-energy structures, 25 Mg quadrupolar coupling constants and chemical shift parameters were calculated using the density functional theory approach. The calculated NMR parameters could be well rationalized by the symmetries of the local magnesium environments, by the coordination of magnesium cations and by the nature of the surrounding anions. In the future, these parameters could serve as a reference for the experimentally determined 25 Mg NMR parameters of magnesium sulfide species.
SHARPEN-systematic hierarchical algorithms for rotamers and proteins on an extended network.

PubMed

Loksha, Ilya V; Maiolo, James R; Hong, Cheng W; Ng, Albert; Snow, Christopher D

2009-04-30

Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. (c) 2009 Wiley Periodicals, Inc.
NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.

PubMed

Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

2013-01-01

Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Expediting topology data gathering for the TOPDB database.

PubMed

Dobson, László; Langó, Tamás; Reményi, István; Tusnády, Gábor E

2015-01-01

The Topology Data Bank of Transmembrane Proteins (TOPDB, http://topdb.enzim.ttk.mta.hu) contains experimentally determined topology data of transmembrane proteins. Recently, we have updated TOPDB from several sources and utilized a newly developed topology prediction algorithm to determine the most reliable topology using the results of experiments as constraints. In addition to collecting the experimentally determined topology data published in the last couple of years, we gathered topographies defined by the TMDET algorithm using 3D structures from the PDBTM. Results of global topology analysis of various organisms as well as topology data generated by high throughput techniques, like the sequential positions of N- or O-glycosylations were incorporated into the TOPDB database. Moreover, a new algorithm was developed to integrate scattered topology data from various publicly available databases and a new method was introduced to measure the reliability of predicted topologies. We show that reliability values highly correlate with the per protein topology accuracy of the utilized prediction method. Altogether, more than 52,000 new topology data and more than 2600 new transmembrane proteins have been collected since the last public release of the TOPDB database. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction

NASA Astrophysics Data System (ADS)

Pan, Bingyue

2018-02-01

In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Fe-Cluster Compounds of Chalcogenides: Candidates for Rare-Earth-Free Permanent Magnet and Magnetic Nodal-Line Topological Material.

PubMed

Zhao, Xin; Wang, Cai-Zhuang; Kim, Minsung; Ho, Kai-Ming

2017-12-04

Fe-cluster-based crystal structures are predicted for chalcogenides Fe 3 X 4 (X = S, Se, Te) using an adaptive genetic algorithm. Topologically different from the well-studied layered structures of iron chalcogenides, the newly predicted structures consist of Fe clusters that are either separated by the chalcogen atoms or connected via sharing of the vertex Fe atoms. Using first-principles calculations, we demonstrate that these structures have competitive or even lower formation energies than the experimentally synthesized Fe 3 X 4 compounds and exhibit interesting magnetic and electronic properties. In particular, we show that Fe 3 Te 4 can be a good candidate as a rare-earth-free permanent magnet and Fe 3 S 4 can be a magnetic nodal-line topological material.
An Integrated Framework Advancing Membrane Protein Modeling and Design

PubMed Central

Weitzner, Brian D.; Duran, Amanda M.; Tilley, Drew C.; Elazar, Assaf; Gray, Jeffrey J.

2015-01-01

Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design. PMID:26325167
NegGOA: negative GO annotations selection using ontology structure.

PubMed

Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

2016-10-01

Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Molecular beacon sequence design algorithm.

PubMed

Monroe, W Todd; Haselton, Frederick R

2003-01-01

A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta.

PubMed

Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J

2010-03-01

PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta

PubMed Central

Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J.

2010-01-01

Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site. Contact: pyrosetta@graylab.jhu.edu PMID:20061306
Development and Testing of Data Mining Algorithms for Earth Observation

NASA Technical Reports Server (NTRS)

Glymour, Clark

2005-01-01

The new algorithms developed under this project included a principled procedure for classification of objects, events or circumstances according to a target variable when a very large number of potential predictor variables is available but the number of cases that can be used for training a classifier is relatively small. These "high dimensional" problems require finding a minimal set of variables -called the Markov Blanket-- sufficient for predicting the value of the target variable. An algorithm, the Markov Blanket Fan Search, was developed, implemented and tested on both simulated and real data in conjunction with a graphical model classifier, which was also implemented. Another algorithm developed and implemented in TETRAD IV for time series elaborated on work by C. Granger and N. Swanson, which in turn exploited some of our earlier work. The algorithms in question learn a linear time series model from data. Given such a time series, the simultaneous residual covariances, after factoring out time dependencies, may provide information about causal processes that occur more rapidly than the time series representation allow, so called simultaneous or contemporaneous causal processes. Working with A. Monetta, a graduate student from Italy, we produced the correct statistics for estimating the contemporaneous causal structure from time series data using the TETRAD IV suite of algorithms. Two economists, David Bessler and Kevin Hoover, have independently published applications using TETRAD style algorithms to the same purpose. These implementations and algorithmic developments were separately used in two kinds of studies of climate data: Short time series of geographically proximate climate variables predicting agricultural effects in California, and longer duration climate measurements of temperature teleconnections.
Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

King, R.D.; Srinivasan, A.

1996-10-01

The machine learning program Progol was applied to the problem of forming the structure-activity relationship (SAR) for a set of compounds tested for carcinogenicity in rodent bioassays by the U.S. National Toxicology Program (NTP). Progol is the first inductive logic programming (ILP) algorithm to use a fully relational method for describing chemical structure in SARs, based on using atoms and their bond connectivities. Progol is well suited to forming SARs for carcinogenicity as it is designed to produce easily understandable rules (structural alerts) for sets of noncongeneric compounds. The Progol SAR method was tested by prediction of a set ofmore » compounds that have been widely predicted by other SAR methods (the compounds used in the NTP`s first round of carcinogenesis predictions). For these compounds no method (human or machine) was significantly more accurate than Progol. Progol was the most accurate method that did not use data from biological tests on rodents (however, the difference in accuracy is not significant). The Progol predictions were based solely on chemical structure and the results of tests for Salmonella mutagenicity. Using the full NTP database, the prediction accuracy of Progol was estimated to be 63% ({+-}3%) using 5-fold cross validation. A set of structural alerts for carcinogenesis was automatically generated and the chemical rationale for them investigated-these structural alerts are statistically independent of the Salmonella mutagenicity. Carcinogenicity is predicted for the compounds used in the NTP`s second round of carcinogenesis predictions. The results for prediction of carcinogenesis, taken together with the previous successful applications of predicting mutagenicity in nitroaromatic compounds, and inhibition of angiogenesis by suramin analogues, show that Progol has a role to play in understanding the SARs of cancer-related compounds. 29 refs., 2 figs., 4 tabs.« less
Protein asparagine deamidation prediction based on structures with machine learning methods.

PubMed

Jia, Lei; Sun, Yaxiong

2017-01-01

Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

Quantifying model-structure- and parameter-driven uncertainties in spring wheat phenology prediction with Bayesian analysis

DOE PAGES

Alderman, Phillip D.; Stanfill, Bryan

2016-10-06

Recent international efforts have brought renewed emphasis on the comparison of different agricultural systems models. Thus far, analysis of model-ensemble simulated results has not clearly differentiated between ensemble prediction uncertainties due to model structural differences per se and those due to parameter value uncertainties. Additionally, despite increasing use of Bayesian parameter estimation approaches with field-scale crop models, inadequate attention has been given to the full posterior distributions for estimated parameters. The objectives of this study were to quantify the impact of parameter value uncertainty on prediction uncertainty for modeling spring wheat phenology using Bayesian analysis and to assess the relativemore » contributions of model-structure-driven and parameter-value-driven uncertainty to overall prediction uncertainty. This study used a random walk Metropolis algorithm to estimate parameters for 30 spring wheat genotypes using nine phenology models based on multi-location trial data for days to heading and days to maturity. Across all cases, parameter-driven uncertainty accounted for between 19 and 52% of predictive uncertainty, while model-structure-driven uncertainty accounted for between 12 and 64%. Here, this study demonstrated the importance of quantifying both model-structure- and parameter-value-driven uncertainty when assessing overall prediction uncertainty in modeling spring wheat phenology. More generally, Bayesian parameter estimation provided a useful framework for quantifying and analyzing sources of prediction uncertainty.« less
Mining the key predictors for event outbreaks in social networks

NASA Astrophysics Data System (ADS)

Yi, Chengqi; Bao, Yuanyuan; Xue, Yibo

2016-04-01

It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation.

PubMed

Usmanova, Dinara R; Bogatyreva, Natalya S; Ariño Bernad, Joan; Eremina, Aleksandra A; Gorshkova, Anastasiya A; Kanevskiy, German M; Lonishin, Lyubov R; Meister, Alexander V; Yakupova, Alisa G; Kondrashov, Fyodor A; Ivankov, Dmitry N

2018-05-02

Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta, and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. ivankov13@gmail.com. Supplementary data are available at Bioinformatics online.
Structured Light-Based 3D Reconstruction System for Plants.

PubMed

Nguyen, Thuy Tuong; Slaughter, David C; Max, Nelson; Maloof, Julin N; Sinha, Neelima

2015-07-29

Camera-based 3D reconstruction of physical objects is one of the most popular computer vision trends in recent years. Many systems have been built to model different real-world subjects, but there is lack of a completely robust system for plants. This paper presents a full 3D reconstruction system that incorporates both hardware structures (including the proposed structured light system to enhance textures on object surfaces) and software algorithms (including the proposed 3D point cloud registration and plant feature measurement). This paper demonstrates the ability to produce 3D models of whole plants created from multiple pairs of stereo images taken at different viewing angles, without the need to destructively cut away any parts of a plant. The ability to accurately predict phenotyping features, such as the number of leaves, plant height, leaf size and internode distances, is also demonstrated. Experimental results show that, for plants having a range of leaf sizes and a distance between leaves appropriate for the hardware design, the algorithms successfully predict phenotyping features in the target crops, with a recall of 0.97 and a precision of 0.89 for leaf detection and less than a 13-mm error for plant size, leaf size and internode distance.
Molecular classification of pesticides including persistent organic pollutants, phenylurea and sulphonylurea herbicides.

PubMed

Torrens, Francisco; Castellano, Gloria

2014-06-05

Pesticide residues in wine were analyzed by liquid chromatography-tandem mass spectrometry. Retentions are modelled by structure-property relationships. Bioplastic evolution is an evolutionary perspective conjugating effect of acquired characters and evolutionary indeterminacy-morphological determination-natural selection principles; its application to design co-ordination index barely improves correlations. Fractal dimensions and partition coefficient differentiate pesticides. Classification algorithms are based on information entropy and its production. Pesticides allow a structural classification by nonplanarity, and number of O, S, N and Cl atoms and cycles; different behaviours depend on number of cycles. The novelty of the approach is that the structural parameters are related to retentions. Classification algorithms are based on information entropy. When applying procedures to moderate-sized sets, excessive results appear compatible with data suffering a combinatorial explosion. However, equipartition conjecture selects criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds agreeing with principal component analyses. Periodic classification shows that pesticides in the same group present similar properties; those also in equal period, maximum resemblance. The advantage of the classification is to predict the retentions for molecules not included in the categorization. Classification extends to phenyl/sulphonylureas and the application will be to predict their retentions.
Accounting for receptor flexibility and enhanced sampling methods in computer-aided drug design.

PubMed

Sinko, William; Lindert, Steffen; McCammon, J Andrew

2013-01-01

Protein flexibility plays a major role in biomolecular recognition. In many cases, it is not obvious how molecular structure will change upon association with other molecules. In proteins, these changes can be major, with large deviations in overall backbone structure, or they can be more subtle as in a side-chain rotation. Either way the algorithms that predict the favorability of biomolecular association require relatively accurate predictions of the bound structure to give an accurate assessment of the energy involved in association. Here, we review a number of techniques that have been proposed to accommodate receptor flexibility in the simulation of small molecules binding to protein receptors. We investigate modifications to standard rigid receptor docking algorithms and also explore enhanced sampling techniques, and the combination of free energy calculations and enhanced sampling techniques. The understanding and allowance for receptor flexibility are helping to make computer simulations of ligand protein binding more accurate. These developments may help improve the efficiency of drug discovery and development. Efficiency will be essential as we begin to see personalized medicine tailored to individual patients, which means specific drugs are needed for each patient's genetic makeup. © 2012 John Wiley & Sons A/S.
Structural health monitoring approach for detecting ice accretion on bridge cable using the Haar Wavelet Transform

NASA Astrophysics Data System (ADS)

Andre, Julia; Kiremidjian, Anne; Liao, Yizheng; Georgakis, Christos; Rajagopal, Ram

2016-04-01

Ice accretion on cables of bridge structures poses serious risk to the structure as well as to vehicular traffic when the ice falls onto the road. Detection of ice formation, quantification of the amount of ice accumulated, and prediction of icefalls will increase the safety and serviceability of the structure. In this paper, an ice accretion detection algorithm is presented based on the Continuous Wavelet Transform (CWT). In the proposed algorithm, the acceleration signals obtained from bridge cables are transformed using wavelet method. The damage sensitive features (DSFs) are defined as a function of the wavelet energy at specific wavelet scales. It is found that as ice accretes on the cables, the mass of cable increases, thus changing the wavelet energies. Hence, the DSFs can be used to track the change of cables mass. To validate the proposed algorithm, we use the data collected from a laboratory experiment conducted at the Technical University of Denmark (DTU). In this experiment, a cable was placed in a wind tunnel as ice volume grew progressively. Several accelerometers were installed at various locations along the testing cable to collect vibration signals.
Learning Instance-Specific Predictive Models

PubMed Central

Visweswaran, Shyam; Cooper, Gregory F.

2013-01-01

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
Training set optimization under population structure in genomic selection

USDA-ARS?s Scientific Manuscript database

The optimization of the training set (TRS) in genomic selection (GS) has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the Coefficient of D...
Systematically evaluating read-across prediction and performance using a local validity approach characterized by chemical structure and bioactivity information

EPA Science Inventory

Read-across is a popular data gap filling technique within category and analogue approaches for regulatory purposes. Acceptance of read-across remains an ongoing challenge with several efforts underway for identifying and addressing uncertainties. Here we demonstrate an algorithm...
Predict Dem Bones!

ERIC Educational Resources Information Center

Gray, John S.

1994-01-01

A detailed analysis and computer-based solution to a puzzle addressing the arrangement of dominoes on a grid is presented. The problem is one used in a college-level data structures or algorithms course. The solution uses backtracking to generate all possible answers. Details of the use of backtracking and techniques for mapping abstract problems…
Real time coarse orientation detection in MR scans using multi-planar deep convolutional neural networks

NASA Astrophysics Data System (ADS)

Bhatia, Parmeet S.; Reda, Fitsum; Harder, Martin; Zhan, Yiqiang; Zhou, Xiang Sean

2017-02-01

Automatically detecting anatomy orientation is an important task in medical image analysis. Specifically, the ability to automatically detect coarse orientation of structures is useful to minimize the effort of fine/accurate orientation detection algorithms, to initialize non-rigid deformable registration algorithms or to align models to target structures in model-based segmentation algorithms. In this work, we present a deep convolution neural network (DCNN)-based method for fast and robust detection of the coarse structure orientation, i.e., the hemi-sphere where the principal axis of a structure lies. That is, our algorithm predicts whether the principal orientation of a structure is in the northern hemisphere or southern hemisphere, which we will refer to as UP and DOWN, respectively, in the remainder of this manuscript. The only assumption of our method is that the entire structure is located within the scan's field-of-view (FOV). To efficiently solve the problem in 3D space, we formulated it as a multi-planar 2D deep learning problem. In the training stage, a large number coronal-sagittal slice pairs are constructed as 2-channel images to train a DCNN to classify whether a scan is UP or DOWN. During testing, we randomly sample a small number of coronal-sagittal 2-channel images and pass them through our trained network. Finally, coarse structure orientation is determined using majority voting. We tested our method on 114 Elbow MR Scans. Experimental results suggest that only five 2-channel images are sufficient to achieve a high success rate of 97.39%. Our method is also extremely fast and takes approximately 50 milliseconds per 3D MR scan. Our method is insensitive to the location of the structure in the FOV.
Recent progress and future directions in protein-protein docking.

PubMed

Ritchie, David W

2008-02-01

This article gives an overview of recent progress in protein-protein docking and it identifies several directions for future research. Recent results from the CAPRI blind docking experiments show that docking algorithms are steadily improving in both reliability and accuracy. Current docking algorithms employ a range of efficient search and scoring strategies, including e.g. fast Fourier transform correlations, geometric hashing, and Monte Carlo techniques. These approaches can often produce a relatively small list of up to a few thousand orientations, amongst which a near-native binding mode is often observed. However, despite the use of improved scoring functions which typically include models of desolvation, hydrophobicity, and electrostatics, current algorithms still have difficulty in identifying the correct solution from the list of false positives, or decoys. Nonetheless, significant progress is being made through better use of bioinformatics, biochemical, and biophysical information such as e.g. sequence conservation analysis, protein interaction databases, alanine scanning, and NMR residual dipolar coupling restraints to help identify key binding residues. Promising new approaches to incorporate models of protein flexibility during docking are being developed, including the use of molecular dynamics snapshots, rotameric and off-rotamer searches, internal coordinate mechanics, and principal component analysis based techniques. Some investigators now use explicit solvent models in their docking protocols. Many of these approaches can be computationally intensive, although new silicon chip technologies such as programmable graphics processor units are beginning to offer competitive alternatives to conventional high performance computer systems. As cryo-EM techniques improve apace, docking NMR and X-ray protein structures into low resolution EM density maps is helping to bridge the resolution gap between these complementary techniques. The use of symmetry and fragment assembly constraints are also helping to make possible docking-based predictions of large multimeric protein complexes. In the near future, the closer integration of docking algorithms with protein interface prediction software, structural databases, and sequence analysis techniques should help produce better predictions of protein interaction networks and more accurate structural models of the fundamental molecular interactions within the cell.
Simple geometric algorithms to aid in clearance management for robotic mechanisms

NASA Technical Reports Server (NTRS)

Copeland, E. L.; Ray, L. D.; Peticolas, J. D.

1981-01-01

Global geometric shapes such as lines, planes, circles, spheres, cylinders, and the associated computational algorithms which provide relatively inexpensive estimates of minimum spatial clearance for safe operations were selected. The Space Shuttle, remote manipulator system, and the Power Extension Package are used as an example. Robotic mechanisms operate in quarters limited by external structures and the problem of clearance is often of considerable interest. Safe clearance management is simple and suited to real time calculation, whereas contact prediction requires more precision, sophistication, and computational overhead.
A traveling salesman approach for predicting protein functions.

PubMed

Johnson, Olin; Liu, Jing

2006-10-12

Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.
A traveling salesman approach for predicting protein functions

PubMed Central

Johnson, Olin; Liu, Jing

2006-01-01

Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783
On the accuracy of modelling the dynamics of large space structures

NASA Technical Reports Server (NTRS)

Diarra, C. M.; Bainum, P. M.

1985-01-01

Proposed space missions will require large scale, light weight, space based structural systems. Large space structure technology (LSST) systems will have to accommodate (among others): ocean data systems; electronic mail systems; large multibeam antenna systems; and, space based solar power systems. The structures are to be delivered into orbit by the space shuttle. Because of their inherent size, modelling techniques and scaling algorithms must be developed so that system performance can be predicted accurately prior to launch and assembly. When the size and weight-to-area ratio of proposed LSST systems dictate that the entire system be considered flexible, there are two basic modeling methods which can be used. The first is a continuum approach, a mathematical formulation for predicting the motion of a general orbiting flexible body, in which elastic deformations are considered small compared with characteristic body dimensions. This approach is based on an a priori knowledge of the frequencies and shape functions of all modes included within the system model. Alternatively, finite element techniques can be used to model the entire structure as a system of lumped masses connected by a series of (restoring) springs and possibly dampers. In addition, a computational algorithm was developed to evaluate the coefficients of the various coupling terms in the equations of motion as applied to the finite element model of the Hoop/Column.
PROPER: global protein interaction network alignment through percolation matching.

PubMed

Kazemi, Ehsan; Hassani, Hamed; Grossglauser, Matthias; Pezeshgi Modarres, Hassan

2016-12-12

The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PPI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch .
Predicting nucleic acid binding interfaces from structural models of proteins.

PubMed

Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

2012-02-01

The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.
Application of ant colony optimization in development of models for prediction of anti-HIV-1 activity of HEPT derivatives.

PubMed

Zare-Shahabadi, Vali; Abbasitabar, Fatemeh

2010-09-01

Quantitative structure-activity relationship models were derived for 107 analogs of 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio)thymine, a potent inhibitor of the HIV-1 reverse transcriptase. The activities of these compounds were investigated by means of multiple linear regression (MLR) technique. An ant colony optimization algorithm, called Memorized_ACS, was applied for selecting relevant descriptors and detecting outliers. This algorithm uses an external memory based upon knowledge incorporation from previous iterations. At first, the memory is empty, and then it is filled by running several ACS algorithms. In this respect, after each ACS run, the elite ant is stored in the memory and the process is continued to fill the memory. Here, pheromone updating is performed by all elite ants collected in the memory; this results in improvements in both exploration and exploitation behaviors of the ACS algorithm. The memory is then made empty and is filled again by performing several ACS algorithms using updated pheromone trails. This process is repeated for several iterations. At the end, the memory contains several top solutions for the problem. Number of appearance of each descriptor in the external memory is a good criterion for its importance. Finally, prediction is performed by the elitist ant, and interpretation is carried out by considering the importance of each descriptor. The best MLR model has a training error of 0.47 log (1/EC(50)) units (R(2) = 0.90) and a prediction error of 0.76 log (1/EC(50)) units (R(2) = 0.88). Copyright 2010 Wiley Periodicals, Inc.

Freiburg RNA tools: a central online resource for RNA-focused research and teaching.

PubMed

Raden, Martin; Ali, Syed M; Alkhnbashi, Omer S; Busch, Anke; Costa, Fabrizio; Davis, Jason A; Eggenhofer, Florian; Gelhausen, Rick; Georg, Jens; Heyne, Steffen; Hiller, Michael; Kundu, Kousik; Kleinkauf, Robert; Lott, Steffen C; Mohamed, Mostafa M; Mattheis, Alexander; Miladi, Milad; Richter, Andreas S; Will, Sebastian; Wolff, Joachim; Wright, Patrick R; Backofen, Rolf

2018-05-21

The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

PubMed

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Identification of Extracellular Segments by Mass Spectrometry Improves Topology Prediction of Transmembrane Proteins.

PubMed

Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G; Medzihradszky, Katalin F; Szakács, Gergely; Tusnády, Gábor E

2017-02-13

Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.
The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.

PubMed

Kohlmayer, Florian; Prasser, Fabian; Kuhn, Klaus A

2015-12-01

With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method. In this article, we first confirm experimentally that combining generalization with suppression significantly increases data utility. Next, we proof that, within this coding model, the outcome of data transformations regarding privacy and utility cannot be predicted. As a consequence, existing algorithms fail to deliver optimal data utility. We confirm this finding experimentally. The limitation of previous work can be overcome at the cost of increased computational complexity. However, scalability is important for anonymizing data with user feedback. Consequently, we identify properties of datasets that may be predicted in our context and propose a novel and efficient algorithm. Finally, we evaluate our solution with multiple datasets and privacy models. This work presents the first thorough investigation of which properties of datasets can be predicted when data is anonymized with generalization and suppression. Our novel approach adopts existing optimization strategies to our context and combines different search methods. The experiments show that our method is able to efficiently solve a broad spectrum of anonymization problems. Our work shows that implementing syntactic privacy models is challenging and that existing algorithms are not well suited for anonymizing data with transformation models which are more complex than generalization alone. As such models have been recommended for use in the biomedical domain, our results are of general relevance for de-identifying structured biomedical data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Correlation of MFOLD-predicted DNA secondary structures with separation patterns obtained by capillary electrophoresis single-strand conformation polymorphism (CE-SSCP) analysis.

PubMed

Glavac, Damjan; Potocnik, Uros; Podpecnik, Darja; Zizek, Teofil; Smerkolj, Sava; Ravnik-Glavac, Metka

2002-04-01

We have studied 57 different mutations within three beta-globin gene promoter fragments with sizes 52 bp, 77 bp, and 193 bp by fluorescent capillary electrophoresis CE-SSCP analysis. For each mutation and wild type, energetically most-favorable predicted secondary structures were calculated for sense and antisense strands using the MFOLD DNA-folding algorithm in order to investigate if any correlation exists between predicted DNA structures and actual CE migration time shifts. The overall CE-SSCP detection rate was 100% for all mutations in three studied DNA fragments. For shorter 52 bp and 77 bp DNA fragments we obtained a positive correlation between the migration time shifts and difference in free energy values of predicted secondary structures at all temperatures. For longer 193 bp beta-globin gene fragments with 46 mutations MFOLD predicted different secondary structures for 89% of mutated strands at 25 degrees C and 40 degrees C. However, the magnitude of the mobility shifts did not necessarily correlate with their secondary structures and free energy values except for the sense strand at 40 degrees C where this correlation was statistically significant (r = 0.312, p = 0.033). Results of this study provided more direct insight into the mechanism of CE-SSCP and showed that MFOLD prediction could be helpful in making decisions about the running temperatures and in prediction of CE-SSCP data patterns, especially for shorter (50-100 bp) DNA fragments. Copyright 2002 Wiley-Liss, Inc.
Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination

PubMed Central

Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.

2010-01-01

A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
External validation of the international risk prediction algorithm for major depressive episode in the US general population: the PredictD-US study.

PubMed

Nigatu, Yeshambel T; Liu, Yan; Wang, JianLi

2016-07-22

Multivariable risk prediction algorithms are useful for making clinical decisions and for health planning. While prediction algorithms for new onset of major depression in the primary care attendees in Europe and elsewhere have been developed, the performance of these algorithms in different populations is not known. The objective of this study was to validate the PredictD algorithm for new onset of major depressive episode (MDE) in the US general population. Longitudinal study design was conducted with approximate 3-year follow-up data from a nationally representative sample of the US general population. A total of 29,621 individuals who participated in Wave 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) and who did not have an MDE in the past year at Wave 1 were included. The PredictD algorithm was directly applied to the selected participants. MDE was assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule, based on the DSM-IV criteria. Among the participants, 8 % developed an MDE over three years. The PredictD algorithm had acceptable discriminative power (C-statistics = 0.708, 95 % CI: 0.696, 0.720), but poor calibration (p < 0.001) with the NESARC data. In the European primary care attendees, the algorithm had a C-statistics of 0.790 (95 % CI: 0.767, 0.813) with a perfect calibration. The PredictD algorithm has acceptable discrimination, but the calibration capacity was poor in the US general population despite of re-calibration. Therefore, based on the results, at current stage, the use of PredictD in the US general population for predicting individual risk of MDE is not encouraged. More independent validation research is needed.
OSPREY: protein design with ensembles, flexibility, and provable algorithms.

PubMed

Gainza, Pablo; Roberts, Kyle E; Georgiev, Ivelin; Lilien, Ryan H; Keedy, Daniel A; Chen, Cheng-Yu; Reza, Faisal; Anderson, Amy C; Richardson, David C; Richardson, Jane S; Donald, Bruce R

2013-01-01

We have developed a suite of protein redesign algorithms that improves realistic in silico modeling of proteins. These algorithms are based on three characteristics that make them unique: (1) improved flexibility of the protein backbone, protein side-chains, and ligand to accurately capture the conformational changes that are induced by mutations to the protein sequence; (2) modeling of proteins and ligands as ensembles of low-energy structures to better approximate binding affinity; and (3) a globally optimal protein design search, guaranteeing that the computational predictions are optimal with respect to the input model. Here, we illustrate the importance of these three characteristics. We then describe OSPREY, a protein redesign suite that implements our protein design algorithms. OSPREY has been used prospectively, with experimental validation, in several biomedically relevant settings. We show in detail how OSPREY has been used to predict resistance mutations and explain why improved flexibility, ensembles, and provability are essential for this application. OSPREY is free and open source under a Lesser GPL license. The latest version is OSPREY 2.0. The program, user manual, and source code are available at www.cs.duke.edu/donaldlab/software.php. osprey@cs.duke.edu. Copyright © 2013 Elsevier Inc. All rights reserved.
Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

NASA Astrophysics Data System (ADS)

Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

2018-01-01

This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

PubMed

Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

2011-06-02

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Deformable structure registration of bladder through surface mapping.

PubMed

Xiong, Li; Viswanathan, Akila; Stewart, Alexandra J; Haker, Steven; Tempany, Clare M; Chin, Lee M; Cormack, Robert A

2006-06-01

Cumulative dose distributions in fractionated radiation therapy depict the dose to normal tissues and therefore may permit an estimation of the risk of normal tissue complications. However, calculation of these distributions is highly challenging because of interfractional changes in the geometry of patient anatomy. This work presents an algorithm for deformable structure registration of the bladder and the verification of the accuracy of the algorithm using phantom and patient data. In this algorithm, the registration process involves conformal mapping of genus zero surfaces using finite element analysis, and guided by three control landmarks. The registration produces a correspondence between fractions of the triangular meshes used to describe the bladder surface. For validation of the algorithm, two types of balloons were inflated gradually to three times their original size, and several computerized tomography (CT) scans were taken during the process. The registration algorithm yielded a local accuracy of 4 mm along the balloon surface. The algorithm was then applied to CT data of patients receiving fractionated high-dose-rate brachytherapy to the vaginal cuff, with the vaginal cylinder in situ. The patients' bladder filling status was intentionally different for each fraction. The three required control landmark points were identified for the bladder based on anatomy. Out of an Institutional Review Board (IRB) approved study of 20 patients, 3 had radiographically identifiable points near the bladder surface that were used for verification of the accuracy of the registration. The verification point as seen in each fraction was compared with its predicted location based on affine as well as deformable registration. Despite the variation in bladder shape and volume, the deformable registration was accurate to 5 mm, consistently outperforming the affine registration. We conclude that the structure registration algorithm presented works with reasonable accuracy and provides a means of calculating cumulative dose distributions.
Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

PubMed Central

Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

2013-01-01

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129
Predicting β-Turns in Protein Using Kernel Logistic Regression

PubMed Central

Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min

2013-01-01

A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793
Predicting β-turns in protein using kernel logistic regression.

PubMed

Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min

2013-01-01

A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.
Link prediction with node clustering coefficient

NASA Astrophysics Data System (ADS)

Wu, Zhihao; Lin, Youfang; Wang, Jing; Gregory, Steve

2016-06-01

Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed Cannistrai-Alanis-Ravai (CAR) index shows the power of local link/triangle information in improving link-prediction accuracy. Inspired by the idea of employing local link/triangle information, we propose a new similarity index with more local structure information. In our method, local link/triangle structure information can be conveyed by clustering coefficient of common-neighbors directly. The reason why clustering coefficient has good effectiveness in estimating the contribution of a common-neighbor is that it employs links existing between neighbors of a common-neighbor and these links have the same structural position with the candidate link to this common-neighbor. In our experiments, three estimators: precision, AUP and AUC are used to evaluate the accuracy of link prediction algorithms. Experimental results on ten tested networks drawn from various fields show that our new index is more effective in predicting missing links than CAR index, especially for networks with low correlation between number of common-neighbors and number of links between common-neighbors.
Integrated identification, modeling and control with applications

NASA Astrophysics Data System (ADS)

Shi, Guojun

This thesis deals with the integration of system design, identification, modeling and control. In particular, six interdisciplinary engineering problems are addressed and investigated. Theoretical results are established and applied to structural vibration reduction and engine control problems. First, the data-based LQG control problem is formulated and solved. It is shown that a state space model is not necessary to solve this problem; rather a finite sequence from the impulse response is the only model data required to synthesize an optimal controller. The new theory avoids unnecessary reliance on a model, required in the conventional design procedure. The infinite horizon model predictive control problem is addressed for multivariable systems. The basic properties of the receding horizon implementation strategy is investigated and the complete framework for solving the problem is established. The new theory allows the accommodation of hard input constraints and time delays. The developed control algorithms guarantee the closed loop stability. A closed loop identification and infinite horizon model predictive control design procedure is established for engine speed regulation. The developed algorithms are tested on the Cummins Engine Simulator and desired results are obtained. A finite signal-to-noise ratio model is considered for noise signals. An information quality index is introduced which measures the essential information precision required for stabilization. The problems of minimum variance control and covariance control are formulated and investigated. Convergent algorithms are developed for solving the problems of interest. The problem of the integrated passive and active control design is addressed in order to improve the overall system performance. A design algorithm is developed, which simultaneously finds: (i) the optimal values of the stiffness and damping ratios for the structure, and (ii) an optimal output variance constrained stabilizing controller such that the active control energy is minimized. A weighted q-Markov COVER method is introduced for identification with measurement noise. The result is use to develop an iterative closed loop identification/control design algorithm. The effectiveness of the algorithm is illustrated by experimental results.
Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen

This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less
A Method to Predict the Structure and Stability of RNA/RNA Complexes.

PubMed

Xu, Xiaojun; Chen, Shi-Jie

2016-01-01

RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.
A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

PubMed Central

Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

2014-01-01

Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953
A comparison of supervised machine learning algorithms and feature vectors for MS lesion segmentation using multimodal structural MRI.

PubMed

Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T

2014-01-01

Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.

Lessons from making the Structural Classification of Proteins (SCOP) and their implications for protein structure modelling.

PubMed

Andreeva, Antonina

2016-06-15

The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Prediction of Compressional, Shear, and Stoneley Wave Velocities from Conventional Well Log Data Using a Committee Machine with Intelligent Systems

NASA Astrophysics Data System (ADS)

Asoodeh, Mojtaba; Bagheripour, Parisa

2012-01-01

Measurement of compressional, shear, and Stoneley wave velocities, carried out by dipole sonic imager (DSI) logs, provides invaluable data in geophysical interpretation, geomechanical studies and hydrocarbon reservoir characterization. The presented study proposes an improved methodology for making a quantitative formulation between conventional well logs and sonic wave velocities. First, sonic wave velocities were predicted from conventional well logs using artificial neural network, fuzzy logic, and neuro-fuzzy algorithms. Subsequently, a committee machine with intelligent systems was constructed by virtue of hybrid genetic algorithm-pattern search technique while outputs of artificial neural network, fuzzy logic and neuro-fuzzy models were used as inputs of the committee machine. It is capable of improving the accuracy of final prediction through integrating the outputs of aforementioned intelligent systems. The hybrid genetic algorithm-pattern search tool, embodied in the structure of committee machine, assigns a weight factor to each individual intelligent system, indicating its involvement in overall prediction of DSI parameters. This methodology was implemented in Asmari formation, which is the major carbonate reservoir rock of Iranian oil field. A group of 1,640 data points was used to construct the intelligent model, and a group of 800 data points was employed to assess the reliability of the proposed model. The results showed that the committee machine with intelligent systems performed more effectively compared with individual intelligent systems performing alone.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zemla, A; Lang, D; Kostova, T

2010-11-29

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
A review of predictive coding algorithms.

PubMed

Spratling, M W

2017-03-01

Predictive coding is a leading theory of how the brain performs probabilistic inference. However, there are a number of distinct algorithms which are described by the term "predictive coding". This article provides a concise review of these different predictive coding algorithms, highlighting their similarities and differences. Five algorithms are covered: linear predictive coding which has a long and influential history in the signal processing literature; the first neuroscience-related application of predictive coding to explaining the function of the retina; and three versions of predictive coding that have been proposed to model cortical function. While all these algorithms aim to fit a generative model to sensory data, they differ in the type of generative model they employ, in the process used to optimise the fit between the model and sensory data, and in the way that they are related to neurobiology. Copyright © 2016 Elsevier Inc. All rights reserved.
PRISM 3: expanded prediction of natural product chemical structures from microbial genomes

PubMed Central

Skinnider, Michael A.; Merwin, Nishanth J.; Johnston, Chad W.

2017-01-01

Abstract Microbial natural products represent a rich resource of pharmaceutically and industrially important compounds. Genome sequencing has revealed that the majority of natural products remain undiscovered, and computational methods to connect biosynthetic gene clusters to their corresponding natural products therefore have the potential to revitalize natural product discovery. Previously, we described PRediction Informatics for Secondary Metabolomes (PRISM), a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides. Here, we present a ground-up rewrite of the PRISM structure prediction algorithm to derive prediction of natural products arising from non-modular biosynthetic paradigms. Within this new version, PRISM 3, natural product scaffolds are modeled as chemical graphs, permitting structure prediction for aminocoumarins, antimetabolites, bisindoles and phosphonate natural products, and building upon the addition of ribosomally synthesized and post-translationally modified peptides. Further, with the addition of cluster detection for 11 new cluster types, PRISM 3 expands to detect 22 distinct natural product cluster types. Other major modifications to PRISM include improved sequence input and ORF detection, user-friendliness and output. Distribution of PRISM 3 over a 300-core server grid improves the speed and capacity of the web application. PRISM 3 is available at http://magarveylab.ca/prism/. PMID:28460067
Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.

PubMed

Ramani, Anantharaman; Wong, Yongxun; Tan, Si Zhen; Shue, Bing Hong; Syn, Christopher

2017-11-01

The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G ST =0.085) than that reported in ALFRED (F ST =0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio) was generally high in the absence of admixture. Misclassification occurred in admixed individuals, who were likely offspring of inter-ethnic marriages, and hence whose self-reported bio-geographic ancestries were dependent on that of their fathers, and in individuals of minority sub-populations with inter-ethnic beliefs. The ancestry prediction capabilities of the 59 SNPs on the ForenSeq kit were reasonably effective in differentiating the Singapore Chinese, Malay and Indian sub-populations, and will be of use for investigative purposes. However, there is potential for more accurate prediction through the evaluation of other AIM sets. Copyright © 2017 Elsevier B.V. All rights reserved.
A Robustly Stabilizing Model Predictive Control Algorithm

NASA Technical Reports Server (NTRS)

Ackmece, A. Behcet; Carson, John M., III

2007-01-01

A model predictive control (MPC) algorithm that differs from prior MPC algorithms has been developed for controlling an uncertain nonlinear system. This algorithm guarantees the resolvability of an associated finite-horizon optimal-control problem in a receding-horizon implementation.
A Novel Approach for Blast-Induced Flyrock Prediction Based on Imperialist Competitive Algorithm and Artificial Neural Network

PubMed Central

Marto, Aminaton; Jahed Armaghani, Danial; Tonnizam Mohamad, Edy; Makhtar, Ahmad Mahir

2014-01-01

Flyrock is one of the major disturbances induced by blasting which may cause severe damage to nearby structures. This phenomenon has to be precisely predicted and subsequently controlled through the changing in the blast design to minimize potential risk of blasting. The scope of this study is to predict flyrock induced by blasting through a novel approach based on the combination of imperialist competitive algorithm (ICA) and artificial neural network (ANN). For this purpose, the parameters of 113 blasting operations were accurately recorded and flyrock distances were measured for each operation. By applying the sensitivity analysis, maximum charge per delay and powder factor were determined as the most influential parameters on flyrock. In the light of this analysis, two new empirical predictors were developed to predict flyrock distance. For a comparison purpose, a predeveloped backpropagation (BP) ANN was developed and the results were compared with those of the proposed ICA-ANN model and empirical predictors. The results clearly showed the superiority of the proposed ICA-ANN model in comparison with the proposed BP-ANN model and empirical approaches. PMID:25147856
HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.

PubMed

Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

2013-02-01

MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.
A novel approach for blast-induced flyrock prediction based on imperialist competitive algorithm and artificial neural network.

PubMed

Marto, Aminaton; Hajihassani, Mohsen; Armaghani, Danial Jahed; Mohamad, Edy Tonnizam; Makhtar, Ahmad Mahir

2014-01-01

Flyrock is one of the major disturbances induced by blasting which may cause severe damage to nearby structures. This phenomenon has to be precisely predicted and subsequently controlled through the changing in the blast design to minimize potential risk of blasting. The scope of this study is to predict flyrock induced by blasting through a novel approach based on the combination of imperialist competitive algorithm (ICA) and artificial neural network (ANN). For this purpose, the parameters of 113 blasting operations were accurately recorded and flyrock distances were measured for each operation. By applying the sensitivity analysis, maximum charge per delay and powder factor were determined as the most influential parameters on flyrock. In the light of this analysis, two new empirical predictors were developed to predict flyrock distance. For a comparison purpose, a predeveloped backpropagation (BP) ANN was developed and the results were compared with those of the proposed ICA-ANN model and empirical predictors. The results clearly showed the superiority of the proposed ICA-ANN model in comparison with the proposed BP-ANN model and empirical approaches.
GARN: Sampling RNA 3D Structure Space with Game Theory and Knowledge-Based Scoring Strategies.

PubMed

Boudard, Mélanie; Bernauer, Julie; Barth, Dominique; Cohen, Johanne; Denise, Alain

2015-01-01

Cellular processes involve large numbers of RNA molecules. The functions of these RNA molecules and their binding to molecular machines are highly dependent on their 3D structures. One of the key challenges in RNA structure prediction and modeling is predicting the spatial arrangement of the various structural elements of RNA. As RNA folding is generally hierarchical, methods involving coarse-grained models hold great promise for this purpose. We present here a novel coarse-grained method for sampling, based on game theory and knowledge-based potentials. This strategy, GARN (Game Algorithm for RNa sampling), is often much faster than previously described techniques and generates large sets of solutions closely resembling the native structure. GARN is thus a suitable starting point for the molecular modeling of large RNAs, particularly those with experimental constraints. GARN is available from: http://garn.lri.fr/.
Detection of functionally important regions in "hypothetical proteins" of known structure.

PubMed

Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir

2008-12-10

Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

PubMed Central

2007-01-01

Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms

NASA Astrophysics Data System (ADS)

Salcedo-Sanz, S.; Deo, R. C.; Carro-Calvo, L.; Saavedra-Moreno, B.

2016-07-01

Long-term air temperature prediction is of major importance in a large number of applications, including climate-related studies, energy, agricultural, or medical. This paper examines the performance of two Machine Learning algorithms (Support Vector Regression (SVR) and Multi-layer Perceptron (MLP)) in a problem of monthly mean air temperature prediction, from the previous measured values in observational stations of Australia and New Zealand, and climate indices of importance in the region. The performance of the two considered algorithms is discussed in the paper and compared to alternative approaches. The results indicate that the SVR algorithm is able to obtain the best prediction performance among all the algorithms compared in the paper. Moreover, the results obtained have shown that the mean absolute error made by the two algorithms considered is significantly larger for the last 20 years than in the previous decades, in what can be interpreted as a change in the relationship among the prediction variables involved in the training of the algorithms.
Fe-Cluster Compounds of Chalcogenides: Candidates for Rare-Earth-Free Permanent Magnet and Magnetic Nodal-Line Topological Material

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Xin; Wang, Cai-Zhuang; Kim, Minsung

Here, Fe-cluster-based crystal structures are predicted for chalcogenides Fe 3X 4 (X = S, Se, Te) using an adaptive genetic algorithm. Topologically different from the well-studied layered structures of iron chalcogenides, the newly predicted structures consist of Fe clusters that are either separated by the chalcogen atoms or connected via sharing of the vertex Fe atoms. Additionally, using first-principles calculations, we demonstrate that these structures have competitive or even lower formation energies than the experimentally synthesized Fe 3X 4 compounds and exhibit interesting magnetic and electronic properties. In particular, we show that Fe 3X 4 can be a good candidatemore » as a rare-earth-free permanent magnet and Fe 3X 4 can be a magnetic nodal-line topological material.« less
Fe-Cluster Compounds of Chalcogenides: Candidates for Rare-Earth-Free Permanent Magnet and Magnetic Nodal-Line Topological Material

DOE PAGES

Zhao, Xin; Wang, Cai-Zhuang; Kim, Minsung; ...

2017-11-13

Here, Fe-cluster-based crystal structures are predicted for chalcogenides Fe 3X 4 (X = S, Se, Te) using an adaptive genetic algorithm. Topologically different from the well-studied layered structures of iron chalcogenides, the newly predicted structures consist of Fe clusters that are either separated by the chalcogen atoms or connected via sharing of the vertex Fe atoms. Additionally, using first-principles calculations, we demonstrate that these structures have competitive or even lower formation energies than the experimentally synthesized Fe 3X 4 compounds and exhibit interesting magnetic and electronic properties. In particular, we show that Fe 3X 4 can be a good candidatemore » as a rare-earth-free permanent magnet and Fe 3X 4 can be a magnetic nodal-line topological material.« less
Convenient QSAR model for predicting the complexation of structurally diverse compounds with beta-cyclodextrins.

PubMed

Pérez-Garrido, Alfonso; Morales Helguera, Aliuska; Abellán Guillén, Adela; Cordeiro, M Natália D S; Garrido Escudero, Amalio

2009-01-15

This paper reports a QSAR study for predicting the complexation of a large and heterogeneous variety of substances (233 organic compounds) with beta-cyclodextrins (beta-CDs). Several different theoretical molecular descriptors, calculated solely from the molecular structure of the compounds under investigation, and an efficient variable selection procedure, like the Genetic Algorithm, led to models with satisfactory global accuracy and predictivity. But the best-final QSAR model is based on Topological descriptors meanwhile offering a reasonable interpretation. This QSAR model was able to explain ca. 84% of the variance in the experimental activity, and displayed very good internal cross-validation statistics and predictivity on external data. It shows that the driving forces for CD complexation are mainly hydrophobic and steric (van der Waals) interactions. Thus, the results of our study provide a valuable tool for future screening and priority testing of beta-CDs guest molecules.
Binding ligand prediction for proteins using partial matching of local surface patches.

PubMed

Sael, Lee; Kihara, Daisuke

2010-01-01

Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

PubMed Central

Sael, Lee; Kihara, Daisuke

2010-01-01

Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188
A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

PubMed Central

2014-01-01

Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php. PMID:24884954

Deadbeat Predictive Controllers

NASA Technical Reports Server (NTRS)

Juang, Jer-Nan; Phan, Minh

1997-01-01

Several new computational algorithms are presented to compute the deadbeat predictive control law. The first algorithm makes use of a multi-step-ahead output prediction to compute the control law without explicitly calculating the controllability matrix. The system identification must be performed first and then the predictive control law is designed. The second algorithm uses the input and output data directly to compute the feedback law. It combines the system identification and the predictive control law into one formulation. The third algorithm uses an observable-canonical form realization to design the predictive controller. The relationship between all three algorithms is established through the use of the state-space representation. All algorithms are applicable to multi-input, multi-output systems with disturbance inputs. In addition to the feedback terms, feed forward terms may also be added for disturbance inputs if they are measurable. Although the feedforward terms do not influence the stability of the closed-loop feedback law, they enhance the performance of the controlled system.
Protein-protein docking using region-based 3D Zernike descriptors

PubMed Central

2009-01-01

Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235
Protein-protein docking using region-based 3D Zernike descriptors.

PubMed

Venkatraman, Vishwesh; Yang, Yifeng D; Sael, Lee; Kihara, Daisuke

2009-12-09

Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-alphaRMSD < or = 2.5 A) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Functional Classification of Immune Regulatory Proteins

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rubinstein, Rotem; Ramagopal, Udupi A.; Nathenson, Stanley G.

2013-05-01

Members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. We were guided by the Brotherhood approach and present the high-resolution structural characterization of a homophilic interaction involving themore » class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.« less
A Deep Learning Algorithm of Neural Network for the Parameterization of Typhoon-Ocean Feedback in Typhoon Forecast Models

NASA Astrophysics Data System (ADS)

Jiang, Guo-Qing; Xu, Jing; Wei, Jun

2018-04-01

Two algorithms based on machine learning neural networks are proposed—the shallow learning (S-L) and deep learning (D-L) algorithms—that can potentially be used in atmosphere-only typhoon forecast models to provide flow-dependent typhoon-induced sea surface temperature cooling (SSTC) for improving typhoon predictions. The major challenge of existing SSTC algorithms in forecast models is how to accurately predict SSTC induced by an upcoming typhoon, which requires information not only from historical data but more importantly also from the target typhoon itself. The S-L algorithm composes of a single layer of neurons with mixed atmospheric and oceanic factors. Such a structure is found to be unable to represent correctly the physical typhoon-ocean interaction. It tends to produce an unstable SSTC distribution, for which any perturbations may lead to changes in both SSTC pattern and strength. The D-L algorithm extends the neural network to a 4 × 5 neuron matrix with atmospheric and oceanic factors being separated in different layers of neurons, so that the machine learning can determine the roles of atmospheric and oceanic factors in shaping the SSTC. Therefore, it produces a stable crescent-shaped SSTC distribution, with its large-scale pattern determined mainly by atmospheric factors (e.g., winds) and small-scale features by oceanic factors (e.g., eddies). Sensitivity experiments reveal that the D-L algorithms improve maximum wind intensity errors by 60-70% for four case study simulations, compared to their atmosphere-only model runs.
Multiscale finite element modeling of sheet molding compound (SMC) composite structure based on stochastic mesostructure reconstruction

DOE PAGES

Chen, Zhangxing; Huang, Tianyu; Shao, Yimin; ...

2018-03-15

Predicting the mechanical behavior of the chopped carbon fiber Sheet Molding Compound (SMC) due to spatial variations in local material properties is critical for the structural performance analysis but is computationally challenging. Such spatial variations are induced by the material flow in the compression molding process. In this work, a new multiscale SMC modeling framework and the associated computational techniques are developed to provide accurate and efficient predictions of SMC mechanical performance. The proposed multiscale modeling framework contains three modules. First, a stochastic algorithm for 3D chip-packing reconstruction is developed to efficiently generate the SMC mesoscale Representative Volume Element (RVE)more » model for Finite Element Analysis (FEA). A new fiber orientation tensor recovery function is embedded in the reconstruction algorithm to match reconstructions with the target characteristics of fiber orientation distribution. Second, a metamodeling module is established to improve the computational efficiency by creating the surrogates of mesoscale analyses. Third, the macroscale behaviors are predicted by an efficient multiscale model, in which the spatially varying material properties are obtained based on the local fiber orientation tensors. Our approach is further validated through experiments at both meso- and macro-scales, such as tensile tests assisted by Digital Image Correlation (DIC) and mesostructure imaging.« less
Multiscale finite element modeling of sheet molding compound (SMC) composite structure based on stochastic mesostructure reconstruction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Zhangxing; Huang, Tianyu; Shao, Yimin

Predicting the mechanical behavior of the chopped carbon fiber Sheet Molding Compound (SMC) due to spatial variations in local material properties is critical for the structural performance analysis but is computationally challenging. Such spatial variations are induced by the material flow in the compression molding process. In this work, a new multiscale SMC modeling framework and the associated computational techniques are developed to provide accurate and efficient predictions of SMC mechanical performance. The proposed multiscale modeling framework contains three modules. First, a stochastic algorithm for 3D chip-packing reconstruction is developed to efficiently generate the SMC mesoscale Representative Volume Element (RVE)more » model for Finite Element Analysis (FEA). A new fiber orientation tensor recovery function is embedded in the reconstruction algorithm to match reconstructions with the target characteristics of fiber orientation distribution. Second, a metamodeling module is established to improve the computational efficiency by creating the surrogates of mesoscale analyses. Third, the macroscale behaviors are predicted by an efficient multiscale model, in which the spatially varying material properties are obtained based on the local fiber orientation tensors. Our approach is further validated through experiments at both meso- and macro-scales, such as tensile tests assisted by Digital Image Correlation (DIC) and mesostructure imaging.« less
Planning for robust reserve networks using uncertainty analysis

USGS Publications Warehouse

Moilanen, A.; Runge, M.C.; Elith, Jane; Tyre, A.; Carmel, Y.; Fegraus, E.; Wintle, B.A.; Burgman, M.; Ben-Haim, Y.

2006-01-01

Planning land-use for biodiversity conservation frequently involves computer-assisted reserve selection algorithms. Typically such algorithms operate on matrices of species presence?absence in sites, or on species-specific distributions of model predicted probabilities of occurrence in grid cells. There are practically always errors in input data?erroneous species presence?absence data, structural and parametric uncertainty in predictive habitat models, and lack of correspondence between temporal presence and long-run persistence. Despite these uncertainties, typical reserve selection methods proceed as if there is no uncertainty in the data or models. Having two conservation options of apparently equal biological value, one would prefer the option whose value is relatively insensitive to errors in planning inputs. In this work we show how uncertainty analysis for reserve planning can be implemented within a framework of information-gap decision theory, generating reserve designs that are robust to uncertainty. Consideration of uncertainty involves modifications to the typical objective functions used in reserve selection. Search for robust-optimal reserve structures can still be implemented via typical reserve selection optimization techniques, including stepwise heuristics, integer-programming and stochastic global search.
Predicting Gene Structures from Multiple RT-PCR Tests

NASA Astrophysics Data System (ADS)

Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.
Detection and Tracking of Dynamic Objects by Using a Multirobot System: Application to Critical Infrastructures Surveillance

PubMed Central

Rodríguez-Canosa, Gonzalo; Giner, Jaime del Cerro; Barrientos, Antonio

2014-01-01

The detection and tracking of mobile objects (DATMO) is progressively gaining importance for security and surveillance applications. This article proposes a set of new algorithms and procedures for detecting and tracking mobile objects by robots that work collaboratively as part of a multirobot system. These surveillance algorithms are conceived of to work with data provided by long distance range sensors and are intended for highly reliable object detection in wide outdoor environments. Contrary to most common approaches, in which detection and tracking are done by an integrated procedure, the approach proposed here relies on a modular structure, in which detection and tracking are carried out independently, and the latter might accept input data from different detection algorithms. Two movement detection algorithms have been developed for the detection of dynamic objects by using both static and/or mobile robots. The solution to the overall problem is based on the use of a Kalman filter to predict the next state of each tracked object. Additionally, new tracking algorithms capable of combining dynamic objects lists coming from either one or various sources complete the solution. The complementary performance of the separated modular structure for detection and identification is evaluated and, finally, a selection of test examples discussed. PMID:24526305
Can human experts predict solubility better than computers?

PubMed

Boobier, Samuel; Osbourn, Anne; Mitchell, John B O

2017-12-13

In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R 2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R 2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.
A Prize-Collecting Steiner Tree Approach for Transduction Network Inference

NASA Astrophysics Data System (ADS)

Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo

Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.
Discrete sequence prediction and its applications

NASA Technical Reports Server (NTRS)

Laird, Philip

1992-01-01

Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.
The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

PubMed

Cui, Zaixu; Gong, Gaolang

2018-06-02

Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
A prediction algorithm for first onset of major depression in the general population: development and validation.

PubMed

Wang, JianLi; Sareen, Jitender; Patten, Scott; Bolton, James; Schmitz, Norbert; Birney, Arden

2014-05-01

Prediction algorithms are useful for making clinical decisions and for population health planning. However, such prediction algorithms for first onset of major depression do not exist. The objective of this study was to develop and validate a prediction algorithm for first onset of major depression in the general population. Longitudinal study design with approximate 3-year follow-up. The study was based on data from a nationally representative sample of the US general population. A total of 28 059 individuals who participated in Waves 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions and who had not had major depression at Wave 1 were included. The prediction algorithm was developed using logistic regression modelling in 21 813 participants from three census regions. The algorithm was validated in participants from the 4th census region (n=6246). Major depression occurred since Wave 1 of the National Epidemiologic Survey on Alcohol and Related Conditions, assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule-diagnostic and statistical manual for mental disorders IV. A prediction algorithm containing 17 unique risk factors was developed. The algorithm had good discriminative power (C statistics=0.7538, 95% CI 0.7378 to 0.7699) and excellent calibration (F-adjusted test=1.00, p=0.448) with the weighted data. In the validation sample, the algorithm had a C statistic of 0.7259 and excellent calibration (Hosmer-Lemeshow χ(2)=3.41, p=0.906). The developed prediction algorithm has good discrimination and calibration capacity. It can be used by clinicians, mental health policy-makers and service planners and the general public to predict future risk of having major depression. The application of the algorithm may lead to increased personalisation of treatment, better clinical decisions and more optimal mental health service planning.
Forecasting impact injuries of unrestrained occupants in railway vehicle passenger compartments.

PubMed

Xie, Suchao; Zhou, Hui

2014-01-01

In order to predict the injury parameters of the occupants corresponding to different experimental parameters and to determine impact injury indices conveniently and efficiently, a model forecasting occupant impact injury was established in this work. The work was based on finite experimental observation values obtained by numerical simulation. First, the various factors influencing the impact injuries caused by the interaction between unrestrained occupants and the compartment's internal structures were collated and the most vulnerable regions of the occupant's body were analyzed. Then, the forecast model was set up based on a genetic algorithm-back propagation (GA-BP) hybrid algorithm, which unified the individual characteristics of the back propagation-artificial neural network (BP-ANN) model and the genetic algorithm (GA). The model was well suited to studies of occupant impact injuries and allowed multiple-parameter forecasts of the occupant impact injuries to be realized assuming values for various influencing factors. Finally, the forecast results for three types of secondary collision were analyzed using forecasting accuracy evaluation methods. All of the results showed the ideal accuracy of the forecast model. When an occupant faced a table, the relative errors between the predicted and experimental values of the respective injury parameters were kept within ± 6.0 percent and the average relative error (ARE) values did not exceed 3.0 percent. When an occupant faced a seat, the relative errors between the predicted and experimental values of the respective injury parameters were kept within ± 5.2 percent and the ARE values did not exceed 3.1 percent. When the occupant faced another occupant, the relative errors between the predicted and experimental values of the respective injury parameters were kept within ± 6.3 percent and the ARE values did not exceed 3.8 percent. The injury forecast model established in this article reduced repeat experiment times and improved the design efficiency of the internal compartment's structure parameters, and it provided a new way for assessing the safety performance of the interior structural parameters in existing, and newly designed, railway vehicle compartments.
A refined methodology for modeling volume quantification performance in CT

NASA Astrophysics Data System (ADS)

Chen, Baiyu; Wilson, Joshua; Samei, Ehsan

2014-03-01

The utility of CT lung nodule volume quantification technique depends on the precision of the quantification. To enable the evaluation of quantification precision, we previously developed a mathematical model that related precision to image resolution and noise properties in uniform backgrounds in terms of an estimability index (e'). The e' was shown to predict empirical precision across 54 imaging and reconstruction protocols, but with different correlation qualities for FBP and iterative reconstruction (IR) due to the non-linearity of IR impacted by anatomical structure. To better account for the non-linearity of IR, this study aimed to refine the noise characterization of the model in the presence of textured backgrounds. Repeated scans of an anthropomorphic lung phantom were acquired. Subtracted images were used to measure the image quantum noise, which was then used to adjust the noise component of the e' calculation measured from a uniform region. In addition to the model refinement, the validation of the model was further extended to 2 nodule sizes (5 and 10 mm) and 2 segmentation algorithms. Results showed that the magnitude of IR's quantum noise was significantly higher in structured backgrounds than in uniform backgrounds (ASiR, 30-50%; MBIR, 100-200%). With the refined model, the correlation between e' values and empirical precision no longer depended on reconstruction algorithm. In conclusion, the model with refined noise characterization relfected the nonlinearity of iterative reconstruction in structured background, and further showed successful prediction of quantification precision across a variety of nodule sizes, dose levels, slice thickness, reconstruction algorithms, and segmentation software.
De Novo Protein Structure Prediction

NASA Astrophysics Data System (ADS)

Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.
PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.

PubMed

von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I

2006-02-06

The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
High skill in low-frequency climate response through fluctuation dissipation theorems despite structural instability.

PubMed

Majda, Andrew J; Abramov, Rafail; Gershgorin, Boris

2010-01-12

Climate change science focuses on predicting the coarse-grained, planetary-scale, longtime changes in the climate system due to either changes in external forcing or internal variability, such as the impact of increased carbon dioxide. The predictions of climate change science are carried out through comprehensive, computational atmospheric, and oceanic simulation models, which necessarily parameterize physical features such as clouds, sea ice cover, etc. Recently, it has been suggested that there is irreducible imprecision in such climate models that manifests itself as structural instability in climate statistics and which can significantly hamper the skill of computer models for climate change. A systematic approach to deal with this irreducible imprecision is advocated through algorithms based on the Fluctuation Dissipation Theorem (FDT). There are important practical and computational advantages for climate change science when a skillful FDT algorithm is established. The FDT response operator can be utilized directly for multiple climate change scenarios, multiple changes in forcing, and other parameters, such as damping and inverse modelling directly without the need of running the complex climate model in each individual case. The high skill of FDT in predicting climate change, despite structural instability, is developed in an unambiguous fashion using mathematical theory as guidelines in three different test models: a generic class of analytical models mimicking the dynamical core of the computer climate models, reduced stochastic models for low-frequency variability, and models with a significant new type of irreducible imprecision involving many fast, unstable modes.

StrBioLib: a Java library for development of custom computational structural biology applications.

PubMed

Chandonia, John-Marc

2007-08-01

StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/
Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

PubMed

Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

2016-01-01

High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
Optimal design of the first stage of the plate-fin heat exchanger for the EAST cryogenic system

NASA Astrophysics Data System (ADS)

Qingfeng, JIANG; Zhigang, ZHU; Qiyong, ZHANG; Ming, ZHUANG; Xiaofei, LU

2018-03-01

The size of the heat exchanger is an important factor determining the dimensions of the cold box in helium cryogenic systems. In this paper, a counter-flow multi-stream plate-fin heat exchanger is optimized by means of a spatial interpolation method coupled with a hybrid genetic algorithm. Compared with empirical correlations, this spatial interpolation algorithm based on a kriging model can be adopted to more precisely predict the Colburn heat transfer factors and Fanning friction factors of offset-strip fins. Moreover, strict computational fluid dynamics simulations can be carried out to predict the heat transfer and friction performance in the absence of reliable experimental data. Within the constraints of heat exchange requirements, maximum allowable pressure drop, existing manufacturing techniques and structural strength, a mathematical model of an optimized design with discrete and continuous variables based on a hybrid genetic algorithm is established in order to minimize the volume. The results show that for the first-stage heat exchanger in the EAST refrigerator, the structural size could be decreased from the original 2.200 × 0.600 × 0.627 (m3) to the optimized 1.854 × 0.420 × 0.340 (m3), with a large reduction in volume. The current work demonstrates that the proposed method could be a useful tool to achieve optimization in an actual engineering project during the practical design process.
PRESS-based EFOR algorithm for the dynamic parametrical modeling of nonlinear MDOF systems

NASA Astrophysics Data System (ADS)

Liu, Haopeng; Zhu, Yunpeng; Luo, Zhong; Han, Qingkai

2017-09-01

In response to the identification problem concerning multi-degree of freedom (MDOF) nonlinear systems, this study presents the extended forward orthogonal regression (EFOR) based on predicted residual sums of squares (PRESS) to construct a nonlinear dynamic parametrical model. The proposed parametrical model is based on the non-linear autoregressive with exogenous inputs (NARX) model and aims to explicitly reveal the physical design parameters of the system. The PRESS-based EFOR algorithm is proposed to identify such a model for MDOF systems. By using the algorithm, we built a common-structured model based on the fundamental concept of evaluating its generalization capability through cross-validation. The resulting model aims to prevent over-fitting with poor generalization performance caused by the average error reduction ratio (AERR)-based EFOR algorithm. Then, a functional relationship is established between the coefficients of the terms and the design parameters of the unified model. Moreover, a 5-DOF nonlinear system is taken as a case to illustrate the modeling of the proposed algorithm. Finally, a dynamic parametrical model of a cantilever beam is constructed from experimental data. Results indicate that the dynamic parametrical model of nonlinear systems, which depends on the PRESS-based EFOR, can accurately predict the output response, thus providing a theoretical basis for the optimal design of modeling methods for MDOF nonlinear systems.
Changes in quantitative 3D shape features of the optic nerve head associated with age

NASA Astrophysics Data System (ADS)

Christopher, Mark; Tang, Li; Fingert, John H.; Scheetz, Todd E.; Abramoff, Michael D.

2013-02-01

Optic nerve head (ONH) structure is an important biological feature of the eye used by clinicians to diagnose and monitor progression of diseases such as glaucoma. ONH structure is commonly examined using stereo fundus imaging or optical coherence tomography. Stereo fundus imaging provides stereo views of the ONH that retain 3D information useful for characterizing structure. In order to quantify 3D ONH structure, we applied a stereo correspondence algorithm to a set of stereo fundus images. Using these quantitative 3D ONH structure measurements, eigen structures were derived using principal component analysis from stereo images of 565 subjects from the Ocular Hypertension Treatment Study (OHTS). To evaluate the usefulness of the eigen structures, we explored associations with the demographic variables age, gender, and race. Using regression analysis, the eigen structures were found to have significant (p < 0.05) associations with both age and race after Bonferroni correction. In addition, classifiers were constructed to predict the demographic variables based solely on the eigen structures. These classifiers achieved an area under receiver operating characteristic curve of 0.62 in predicting a binary age variable, 0.52 in predicting gender, and 0.67 in predicting race. The use of objective, quantitative features or eigen structures can reveal hidden relationships between ONH structure and demographics. The use of these features could similarly allow specific aspects of ONH structure to be isolated and associated with the diagnosis of glaucoma, disease progression and outcomes, and genetic factors.
Prediction of pork quality parameters by applying fractals and data mining on MRI.

PubMed

Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés; Amigo, José Manuel; Dahl, Anders B; ErsbØll, Bjarne K; Antequera, Teresa

2017-09-01

This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate to excellent correlation coefficients were achieved by using the following combinations of acquisition sequences of MRI, fractal algorithms and data mining techniques: SE-FTA-MLR, SE-OPFTA-IR, GE-OPFTA-MLR, SE-OPFTA-MLR, with the last one offering the best prediction results. Thus, SE-OPFTA-MLR could be proposed as an alternative technique to determine physico-chemical traits of fresh and dry-cured loins in a non-destructive way with high accuracy. Copyright © 2017. Published by Elsevier Ltd.
Phase Transition and Physical Properties of InS

NASA Astrophysics Data System (ADS)

Wang, Hai-Yan; Li, Xiao-Feng; Xu, Lei; Li, Xu-Sheng; Hu, Qian-Ku

2018-02-01

Using the crystal structure prediction method based on particle swarm optimization algorithm, three phases (Pnnm, C2/m and Pm-3m) for InS are predicted. The new phase Pm-3m of InS under high pressure is firstly reported in the work. The structural features and electronic structure under high pressure of InS are fully investigated. We predicted the stable ground-state structure of InS was the Pnnm phase and phase transformation of InS from Pnnm phase to Pm-3m phase is firstly found at the pressure of about 29.5 GPa. According to the calculated enthalpies of InS with four structures in the pressure range from 20 GPa to 45 GPa, we find the C2/m phase is a metastable phase. The calculated band gap value of about 2.08 eV for InS with Pnnm structure at 0 GPa agrees well with the experimental value. Moreover, the electronic structure suggests that the C2/m and Pm-3m phase are metallic phases. Supported by the National Natural Science Foundation of China under Grant Nos. 11404099, 11304140, 11147167 and Funds of Outstanding Youth of Henan Polytechnic University, China under Grant No. J2014-05
Improving consensus structure by eliminating averaging artifacts

PubMed Central

KC, Dukka B

2009-01-01

Background Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles. Results Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to Cα models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 Å compared to the average RMSD of the averaged structures from the native structure (3.28 Å for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA [1], our approach produces representative structure of similar RMSD quality, but with much fewer clashes. Conclusion The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction [2], which could also benefit from our approach. PMID:19267905
Crystal Structure Prediction and its Application in Earth and Materials Sciences

NASA Astrophysics Data System (ADS)

Zhu, Qiang

First of all, we describe how to predict crystal structure by evolutionary approach, and extend this method to study the packing of organic molecules, by our specially designed constrained evolutionary algorithm. The main feature of this new approach is that each unit or molecule is treated as a whole body, which drastically reduces the search space and improves the efficiency. The improved method is possibly to be applied in the fields of (1) high pressure phase of simple molecules (H2O, NH3, CH4, etc); (2) pharmaceutical molecules (glycine, aspirin, etc); (3) complex inorganic crystals containing cluster or molecular unit, (Mg(BH4)2, Ca(BH4)2, etc). One application of the constrained evolutionary algorithm is given by the study of (Mg(BH4)2, which is a promising materials for hydrogen storage. Our prediction does not only reproduce the previous work on Mg(BH4)2 at ambient condition, but also yields two new tetragonal structures at high pressure, with space groups P4 and I41/acd are predicted to be lower in enthalpy, by 15.4 kJ/mol and 21.2 kJ/mol, respectively, than the earlier proposed P42nm phase. We have simulated X-ray diffraction spectra, lattice dynamics, and equations of state of these phases. The density, volume contraction, bulk modulus, and the simulated XRD patterns of P4 and I41/acd structures are in excellent agreement with the experimental results. Two kinds of oxides (Xe-O and Mg-O) have been studied under megabar pressures. For XeO, we predict the existence of thermodynamically stable Xe-O compounds at high pressures (XeO, XeO2 and XeO3 become stable at pressures of 83, 102 and 114 GPa, respectively). For Mg-O, our calculations find that two extraordinary compounds MgO2 and Mg3O 2 become thermodynamically stable at 116 GPa and 500 GPa, respectively. Our calculations indicate large charge transfer in these oxides for both systems, suggesting that large electronegativity difference and pressure are the key factors favouring their formations. We also discuss if these oxides might exist at earth and planetary conditions. If the target properties are set as the global fitness functions while structure relaxations are energy/enthalpy minimization, such hybrid optimization technique could effectively explore the landscape of properties for the given systems. Here we illustrate this function by the case of searching for superdense carbon allotropes. We find three structures (hP3, tI12, and tP12) that have significantly greater density. Furthermore, we find a collection of other superdense structures based on different ways of packing carbon tetrahedral. Superdense carbon allotropes are predicted to have remarkably high refractive indices and strong dispersion of light. Apart from evolutionary approach, there also exist some other methods for structural prediction. One can also combine the features from different methods. We develop a novel method for crystal structure prediction, based on metadynamics and evolutionary algorithms. This technique can be used to produce efficiently both the ground state and metastable states easily reachable from a reasonable initial structure. We use the cell shape as collective variable and evolutionary variation operators developed in the context of the USPEX method to equilibrate the system as a function of the collective variables. We illustrate how this approach helps one to find stable and metastable states for Al2SiO5, SiO2, MgSiO3. Apart from predicting crystal structures, the new method can also provide insight into mechanisms of phase transitions. This method is especially powerful in sampling the metastable structures from a given configuration. Experiments on cold compression indicated the existence of a new superhard carbon allotrope. Numerous metastable candidate structures featuring different topologies have been proposed for this allotrope. We use evolutionary metadynamics to systematically search for possible candidates which could be accessible from graphite. (Abstract shortened by UMI.)
Statistics based sampling for controller and estimator design

NASA Astrophysics Data System (ADS)

Tenne, Dirk

The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

PubMed Central

2013-01-01

Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Novel Automated Approach to Predict the Outcome of Laser Peripheral Iridotomy for Primary Angle Closure Suspect Eyes Using Anterior Segment Optical Coherence Tomography.

PubMed

Koh, Victor; Swamidoss, Issac Niwas; Aquino, Maria Cecilia D; Chew, Paul T; Sng, Chelvin

2018-04-27

Develop an algorithm to predict the success of laser peripheral iridotomy (LPI) in primary angle closure suspect (PACS), using pre-treatment anterior segment optical coherence tomography (ASOCT) scans. A total of 116 eyes with PACS underwent LPI and time-domain ASOCT scans (temporal and nasal cuts) were performed before and 1 month after LPI. All the post-treatment scans were classified to one of the following categories: (a) both angles open, (b) one of two angles open and (c) both angles closed. After LPI, success is defined as one or more angles changed from close to open. In this proposed method, the pre and post-LPI ASOCT scans were registered at the corresponding angles based on similarities between the respective local descriptor features and random sample consensus technique was used to identify the largest consensus set of correspondences between the pre and post-LPI ASOCT scans. Subsequently, features such as correlation co-efficient (CC) and structural similarity index (SSIM) were extracted and correlated with the success of LPI. We included 116 eyes and 91 (78.44%) eyes fulfilled the criteria for success after LPI. Using the CC and SSIM index scores from this training set of ASOCT images, our algorithm showed that the success of LPI in eyes with narrow angles can be predicted with 89.7% accuracy, specificity of 95.2% and sensitivity of 36.4% based on pre-LPI ASOCT scans only. Using pre-LPI ASOCT scans, our proposed algorithm showed good accuracy in predicting the success of LPI for PACS eyes. This fully-automated algorithm could aid decision making in offering LPI as a prophylactic treatment for PACS.
Structural design methodologies for ceramic-based material systems

NASA Technical Reports Server (NTRS)

Duffy, Stephen F.; Chulya, Abhisak; Gyekenyesi, John P.

1991-01-01

One of the primary pacing items for realizing the full potential of ceramic-based structural components is the development of new design methods and protocols. The focus here is on low temperature, fast-fracture analysis of monolithic, whisker-toughened, laminated, and woven ceramic composites. A number of design models and criteria are highlighted. Public domain computer algorithms, which aid engineers in predicting the fast-fracture reliability of structural components, are mentioned. Emphasis is not placed on evaluating the models, but instead is focused on the issues relevant to the current state of the art.
A Mixtures-of-Trees Framework for Multi-Label Classification

PubMed Central

Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

2015-01-01

We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
Analysis of whisker-toughened CMC structural components using an interactive reliability model

NASA Technical Reports Server (NTRS)

Duffy, Stephen F.; Palko, Joseph L.

1992-01-01

Realizing wider utilization of ceramic matrix composites (CMC) requires the development of advanced structural analysis technologies. This article focuses on the use of interactive reliability models to predict component probability of failure. The deterministic William-Warnke failure criterion serves as theoretical basis for the reliability model presented here. The model has been implemented into a test-bed software program. This computer program has been coupled to a general-purpose finite element program. A simple structural problem is presented to illustrate the reliability model and the computer algorithm.
A robust damage-detection technique with environmental variability combining time-series models with principal components

NASA Astrophysics Data System (ADS)

Lakshmi, K.; Rama Mohan Rao, A.

2014-10-01

In this paper, a novel output-only damage-detection technique based on time-series models for structural health monitoring in the presence of environmental variability and measurement noise is presented. The large amount of data obtained in the form of time-history response is transformed using principal component analysis, in order to reduce the data size and thereby improve the computational efficiency of the proposed algorithm. The time instant of damage is obtained by fitting the acceleration time-history data from the structure using autoregressive (AR) and AR with exogenous inputs time-series prediction models. The probability density functions (PDFs) of damage features obtained from the variances of prediction errors corresponding to references and healthy current data are found to be shifting from each other due to the presence of various uncertainties such as environmental variability and measurement noise. Control limits using novelty index are obtained using the distances of the peaks of the PDF curves in healthy condition and used later for determining the current condition of the structure. Numerical simulation studies have been carried out using a simply supported beam and also validated using an experimental benchmark data corresponding to a three-storey-framed bookshelf structure proposed by Los Alamos National Laboratory. Studies carried out in this paper clearly indicate the efficiency of the proposed algorithm for damage detection in the presence of measurement noise and environmental variability.
Comparison of two optimization algorithms for fuzzy finite element model updating for damage detection in a wind turbine blade

NASA Astrophysics Data System (ADS)

Turnbull, Heather; Omenzetter, Piotr

2018-03-01

vDifficulties associated with current health monitoring and inspection practices combined with harsh, often remote, operational environments of wind turbines highlight the requirement for a non-destructive evaluation system capable of remotely monitoring the current structural state of turbine blades. This research adopted a physics based structural health monitoring methodology through calibration of a finite element model using inverse techniques. A 2.36m blade from a 5kW turbine was used as an experimental specimen, with operational modal analysis techniques utilised to realize the modal properties of the system. Modelling the experimental responses as fuzzy numbers using the sub-level technique, uncertainty in the response parameters was propagated back through the model and into the updating parameters. Initially, experimental responses of the blade were obtained, with a numerical model of the blade created and updated. Deterministic updating was carried out through formulation and minimisation of a deterministic objective function using both firefly algorithm and virus optimisation algorithm. Uncertainty in experimental responses were modelled using triangular membership functions, allowing membership functions of updating parameters (Young's modulus and shear modulus) to be obtained. Firefly algorithm and virus optimisation algorithm were again utilised, however, this time in the solution of fuzzy objective functions. This enabled uncertainty associated with updating parameters to be quantified. Varying damage location and severity was simulated experimentally through addition of small masses to the structure intended to cause a structural alteration. A damaged model was created, modelling four variable magnitude nonstructural masses at predefined points and updated to provide a deterministic damage prediction and information in relation to the parameters uncertainty via fuzzy updating.
[Detection of the main quality indicators in red wine with infrared spectroscopy based on FastICA and neural network].

PubMed

Fang, Li-Min; Lin, Min

2009-08-01

For the rapid detection of the ethanol, pH and rest sugar in red wine, infrared (IR) spectra of 44 wine samples were analyzed. The algorithm of fast independent component analysis (FastICA) was used to decompose the data of IR spectra, and their independent components and the mixing matrix were obtained. Then, the ICA-NNR calibration model with three-level artificial neural network (ANN) structure was built by using back-propagation (BP) algorithm. The models were used to estimate the contents of ethanol, pH and rest sugar in red wine samples for both in calibration set and predicted set. Correlation coefficient (r) of prediction and root mean square error of prediction (RMSEP) were used as the evaluation indexes. The results indicate that the r and RMSEP for the prediction of ethanol content, pH and rest sugar content are 0.953, 0.983 and 0.994, and 0.161, 0.017 and 0.181, respectively. The maximum relative deviations between the ICA-NNR method predicted value and referenced value of the 22 samples in predicted set are less than 4%. The results of this paper provide a foundation for the application and further development of IR on-line red wine analyzer.
Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data

PubMed Central

2017-01-01

In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks. PMID:29236718
A constitutive material model for nonlinear finite element structural analysis using an iterative matrix approach

NASA Technical Reports Server (NTRS)

Koenig, Herbert A.; Chan, Kwai S.; Cassenti, Brice N.; Weber, Richard

1988-01-01

A unified numerical method for the integration of stiff time dependent constitutive equations is presented. The solution process is directly applied to a constitutive model proposed by Bodner. The theory confronts time dependent inelastic behavior coupled with both isotropic hardening and directional hardening behaviors. Predicted stress-strain responses from this model are compared to experimental data from cyclic tests on uniaxial specimens. An algorithm is developed for the efficient integration of the Bodner flow equation. A comparison is made with the Euler integration method. An analysis of computational time is presented for the three algorithms.

A novel approach to multiple sequence alignment using hadoop data grids.

PubMed

Sudha Sadasivam, G; Baktavatchalam, G

2010-01-01

Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Fang, Xiao; Blazek, Jonathan A.; McEwen, Joseph E.

Cosmological perturbation theory is a powerful tool to predict the statistics of large-scale structure in the weakly non-linear regime, but even at 1-loop order it results in computationally expensive mode-coupling integrals. Here we present a fast algorithm for computing 1-loop power spectra of quantities that depend on the observer's orientation, thereby generalizing the FAST-PT framework (McEwen et al., 2016) that was originally developed for scalars such as the matter density. This algorithm works for an arbitrary input power spectrum and substantially reduces the time required for numerical evaluation. We apply the algorithm to four examples: intrinsic alignments of galaxies inmore » the tidal torque model; the Ostriker-Vishniac effect; the secondary CMB polarization due to baryon flows; and the 1-loop matter power spectrum in redshift space. Code implementing this algorithm and these applications is publicly available at https://github.com/JoeMcEwen/FAST-PT.« less
Analysis of a simulation algorithm for direct brain drug delivery

PubMed Central

Rosenbluth, Kathryn Hammond; Eschermann, Jan Felix; Mittermeyer, Gabriele; Thomson, Rowena; Mittermeyer, Stephan; Bankiewicz, Krystof S.

2011-01-01

Convection enhanced delivery (CED) achieves targeted delivery of drugs with a pressure-driven infusion through a cannula placed stereotactically in the brain. This technique bypasses the blood brain barrier and gives precise distributions of drugs, minimizing off-target effects of compounds such as viral vectors for gene therapy or toxic chemotherapy agents. The exact distribution is affected by the cannula positioning, flow rate and underlying tissue structure. This study presents an analysis of a simulation algorithm for predicting the distribution using baseline MRI images acquired prior to inserting the cannula. The MRI images included diffusion tensor imaging (DTI) to estimate the tissue properties. The algorithm was adapted for the devices and protocols identified for upcoming trials and validated with direct MRI visualization of Gadolinium in 20 infusions in non-human primates. We found strong agreement between the size and location of the simulated and gadolinium volumes, demonstrating the clinical utility of this surgical planning algorithm. PMID:21945468
A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

NASA Astrophysics Data System (ADS)

Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

2010-07-01

Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.
Mechanobiological simulations of peri-acetabular bone ingrowth: a comparative analysis of cell-phenotype specific and phenomenological algorithms.

PubMed

Mukherjee, Kaushik; Gupta, Sanjay

2017-03-01

Several mechanobiology algorithms have been employed to simulate bone ingrowth around porous coated implants. However, there is a scarcity of quantitative comparison between the efficacies of commonly used mechanoregulatory algorithms. The objectives of this study are: (1) to predict peri-acetabular bone ingrowth using cell-phenotype specific algorithm and to compare these predictions with those obtained using phenomenological algorithm and (2) to investigate the influences of cellular parameters on bone ingrowth. The variation in host bone material property and interfacial micromotion of the implanted pelvis were mapped onto the microscale model of implant-bone interface. An overall variation of 17-88 % in peri-acetabular bone ingrowth was observed. Despite differences in predicted tissue differentiation patterns during the initial period, both the algorithms predicted similar spatial distribution of neo-tissue layer, after attainment of equilibrium. Results indicated that phenomenological algorithm, being computationally faster than the cell-phenotype specific algorithm, might be used to predict peri-prosthetic bone ingrowth. The cell-phenotype specific algorithm, however, was found to be useful in numerically investigating the influence of alterations in cellular activities on bone ingrowth, owing to biologically related factors. Amongst the host of cellular activities, matrix production rate of bone tissue was found to have predominant influence on peri-acetabular bone ingrowth.
Structured Light-Based 3D Reconstruction System for Plants

PubMed Central

Nguyen, Thuy Tuong; Slaughter, David C.; Max, Nelson; Maloof, Julin N.; Sinha, Neelima

2015-01-01

Camera-based 3D reconstruction of physical objects is one of the most popular computer vision trends in recent years. Many systems have been built to model different real-world subjects, but there is lack of a completely robust system for plants.This paper presents a full 3D reconstruction system that incorporates both hardware structures (including the proposed structured light system to enhance textures on object surfaces) and software algorithms (including the proposed 3D point cloud registration and plant feature measurement). This paper demonstrates the ability to produce 3D models of whole plants created from multiple pairs of stereo images taken at different viewing angles, without the need to destructively cut away any parts of a plant. The ability to accurately predict phenotyping features, such as the number of leaves, plant height, leaf size and internode distances, is also demonstrated. Experimental results show that, for plants having a range of leaf sizes and a distance between leaves appropriate for the hardware design, the algorithms successfully predict phenotyping features in the target crops, with a recall of 0.97 and a precision of 0.89 for leaf detection and less than a 13-mm error for plant size, leaf size and internode distance. PMID:26230701
Modeling Spatial Dependencies and Semantic Concepts in Data Mining

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vatsavai, Raju

Data mining is the process of discovering new patterns and relationships in large datasets. However, several studies have shown that general data mining techniques often fail to extract meaningful patterns and relationships from the spatial data owing to the violation of fundamental geospatial principles. In this tutorial, we introduce basic principles behind explicit modeling of spatial and semantic concepts in data mining. In particular, we focus on modeling these concepts in the widely used classification, clustering, and prediction algorithms. Classification is the process of learning a structure or model (from user given inputs) and applying the known model to themore » new data. Clustering is the process of discovering groups and structures in the data that are ``similar,'' without applying any known structures in the data. Prediction is the process of finding a function that models (explains) the data with least error. One common assumption among all these methods is that the data is independent and identically distributed. Such assumptions do not hold well in spatial data, where spatial dependency and spatial heterogeneity are a norm. In addition, spatial semantics are often ignored by the data mining algorithms. In this tutorial we cover recent advances in explicitly modeling of spatial dependencies and semantic concepts in data mining.« less
Performance of Optimized Actuator and Sensor Arrays in an Active Noise Control System

NASA Technical Reports Server (NTRS)

Palumbo, D. L.; Padula, S. L.; Lyle, K. H.; Cline, J. H.; Cabell, R. H.

1996-01-01

Experiments have been conducted in NASA Langley's Acoustics and Dynamics Laboratory to determine the effectiveness of optimized actuator/sensor architectures and controller algorithms for active control of harmonic interior noise. Tests were conducted in a large scale fuselage model - a composite cylinder which simulates a commuter class aircraft fuselage with three sections of trim panel and a floor. Using an optimization technique based on the component transfer functions, combinations of 4 out of 8 piezoceramic actuators and 8 out of 462 microphone locations were evaluated against predicted performance. A combinatorial optimization technique called tabu search was employed to select the optimum transducer arrays. Three test frequencies represent the cases of a strong acoustic and strong structural response, a weak acoustic and strong structural response and a strong acoustic and weak structural response. Noise reduction was obtained using a Time Averaged/Gradient Descent (TAGD) controller. Results indicate that the optimization technique successfully predicted best and worst case performance. An enhancement of the TAGD control algorithm was also evaluated. The principal components of the actuator/sensor transfer functions were used in the PC-TAGD controller. The principal components are shown to be independent of each other while providing control as effective as the standard TAGD.
Improving personalized link prediction by hybrid diffusion

NASA Astrophysics Data System (ADS)

Liu, Jin-Hu; Zhu, Yu-Xiao; Zhou, Tao

2016-04-01

Inspired by traditional link prediction and to solve the problem of recommending friends in social networks, we introduce the personalized link prediction in this paper, in which each individual will get equal number of diversiform predictions. While the performances of many classical algorithms are not satisfactory under this framework, thus new algorithms are in urgent need. Motivated by previous researches in other fields, we generalize heat conduction process to the framework of personalized link prediction and find that this method outperforms many classical similarity-based algorithms, especially in the performance of diversity. In addition, we demonstrate that adding one ground node that is supposed to connect all the nodes in the system will greatly benefit the performance of heat conduction. Finally, better hybrid algorithms composed of local random walk and heat conduction have been proposed. Numerical results show that the hybrid algorithms can outperform other algorithms simultaneously in all four adopted metrics: AUC, precision, recall and hamming distance. In a word, this work may shed some light on the in-depth understanding of the effect of physical processes in personalized link prediction.
Evolutionary Dynamic Multiobjective Optimization Via Kalman Filter Prediction.

PubMed

Muruganantham, Arrchana; Tan, Kay Chen; Vadakkepat, Prahlad

2016-12-01

Evolutionary algorithms are effective in solving static multiobjective optimization problems resulting in the emergence of a number of state-of-the-art multiobjective evolutionary algorithms (MOEAs). Nevertheless, the interest in applying them to solve dynamic multiobjective optimization problems has only been tepid. Benchmark problems, appropriate performance metrics, as well as efficient algorithms are required to further the research in this field. One or more objectives may change with time in dynamic optimization problems. The optimization algorithm must be able to track the moving optima efficiently. A prediction model can learn the patterns from past experience and predict future changes. In this paper, a new dynamic MOEA using Kalman filter (KF) predictions in decision space is proposed to solve the aforementioned problems. The predictions help to guide the search toward the changed optima, thereby accelerating convergence. A scoring scheme is devised to hybridize the KF prediction with a random reinitialization method. Experimental results and performance comparisons with other state-of-the-art algorithms demonstrate that the proposed algorithm is capable of significantly improving the dynamic optimization performance.
Computational Search for Strong Topological Insulators: An Exercise in Data Mining and Electronic Structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klintenberg, M.; Haraldsen, Jason T.; Balatsky, Alexander V.

In this paper, we report a data-mining investigation for the search of topological insulators by examining individual electronic structures for over 60,000 materials. Using a data-mining algorithm, we survey changes in band inversion with and without spin-orbit coupling by screening the calculated electronic band structure for a small gap and a change concavity at high-symmetry points. Overall, we were able to identify a number of topological candidates with varying structures and composition. Lastly, our overall goal is expand the realm of predictive theory into the determination of new and exotic complex materials through the data mining of electronic structure.
Computational Search for Strong Topological Insulators: An Exercise in Data Mining and Electronic Structure

DOE PAGES

Klintenberg, M.; Haraldsen, Jason T.; Balatsky, Alexander V.

2014-06-19

In this paper, we report a data-mining investigation for the search of topological insulators by examining individual electronic structures for over 60,000 materials. Using a data-mining algorithm, we survey changes in band inversion with and without spin-orbit coupling by screening the calculated electronic band structure for a small gap and a change concavity at high-symmetry points. Overall, we were able to identify a number of topological candidates with varying structures and composition. Lastly, our overall goal is expand the realm of predictive theory into the determination of new and exotic complex materials through the data mining of electronic structure.
Predicting Vasovagal Syncope from Heart Rate and Blood Pressure: A Prospective Study in 140 Subjects.

PubMed

Virag, Nathalie; Erickson, Mark; Taraborrelli, Patricia; Vetter, Rolf; Lim, Phang Boon; Sutton, Richard

2018-04-28

We developed a vasovagal syncope (VVS) prediction algorithm for use during head-up tilt with simultaneous analysis of heart rate (HR) and systolic blood pressure (SBP). We previously tested this algorithm retrospectively in 1155 subjects, showing sensitivity 95%, specificity 93% and median prediction time of 59s. This study was prospective, single center, on 140 subjects to evaluate this VVS prediction algorithm and assess if retrospective results were reproduced and clinically relevant. Primary endpoint was VVS prediction: sensitivity and specificity >80%. In subjects, referred for 60° head-up tilt (Italian protocol), non-invasive HR and SBP were supplied to the VVS prediction algorithm: simultaneous analysis of RR intervals, SBP trends and their variability represented by low-frequency power generated cumulative risk which was compared with a predetermined VVS risk threshold. When cumulative risk exceeded threshold, an alert was generated. Prediction time was duration between first alert and syncope. Of 140 subjects enrolled, data was usable for 134. Of 83 tilt+ve (61.9%), 81 VVS events were correctly predicted and of 51 tilt-ve subjects (38.1%), 45 were correctly identified as negative by the algorithm. Resulting algorithm performance was sensitivity 97.6%, specificity 88.2%, meeting primary endpoint. Mean VVS prediction time was 2min 26s±3min16s with median 1min 25s. Using only HR and HR variability (without SBP) the mean prediction time reduced to 1min34s±1min45s with median 1min13s. The VVS prediction algorithm, is clinically-relevant tool and could offer applications including providing a patient alarm, shortening tilt-test time, or triggering pacing intervention in implantable devices. Copyright © 2018. Published by Elsevier Inc.
DIMA 3.0: Domain Interaction Map.

PubMed

Luo, Qibin; Pagel, Philipp; Vilne, Baiba; Frishman, Dmitrij

2011-01-01

Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46,900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
Correlating Free-Volume Hole Distribution to the Glass Transition Temperature of Epoxy Polymers.

PubMed

Aramoon, Amin; Breitzman, Timothy D; Woodward, Christopher; El-Awady, Jaafar A

2017-09-07

A new algorithm is developed to quantify the free-volume hole distribution and its evolution in coarse-grained molecular dynamics simulations of polymeric networks. This is achieved by analyzing the geometry of the network rather than a voxelized image of the structure to accurately and efficiently find and quantify free-volume hole distributions within large scale simulations of polymer networks. The free-volume holes are quantified by fitting the largest ellipsoids and spheres in the free-volumes between polymer chains. The free-volume hole distributions calculated from this algorithm are shown to be in excellent agreement with those measured from positron annihilation lifetime spectroscopy (PALS) experiments at different temperature and pressures. Based on the results predicted using this algorithm, an evolution model is proposed for the thermal behavior of an individual free-volume hole. This model is calibrated such that the average radius of free-volumes holes mimics the one predicted from the simulations. The model is then employed to predict the glass-transition temperature of epoxy polymers with different degrees of cross-linking and lengths of prepolymers. Comparison between the predicted glass-transition temperatures and those measured from simulations or experiments implies that this model is capable of successfully predicting the glass-transition temperature of the material using only a PDF of the initial free-volume holes radii of each microstructure. This provides an effective approach for the optimized design of polymeric systems on the basis of the glass-transition temperature, degree of cross-linking, and average length of prepolymers.
Efficient algorithms for probing the RNA mutation landscape.

PubMed

Waldispühl, Jérôme; Devadas, Srinivas; Berger, Bonnie; Clote, Peter

2008-08-08

The diversity and importance of the role played by RNAs in the regulation and development of the cell are now well-known and well-documented. This broad range of functions is achieved through specific structures that have been (presumably) optimized through evolution. State-of-the-art methods, such as McCaskill's algorithm, use a statistical mechanics framework based on the computation of the partition function over the canonical ensemble of all possible secondary structures on a given sequence. Although secondary structure predictions from thermodynamics-based algorithms are not as accurate as methods employing comparative genomics, the former methods are the only available tools to investigate novel RNAs, such as the many RNAs of unknown function recently reported by the ENCODE consortium. In this paper, we generalize the McCaskill partition function algorithm to sum over the grand canonical ensemble of all secondary structures of all mutants of the given sequence. Specifically, our new program, RNAmutants, simultaneously computes for each integer k the minimum free energy structure MFE(k) and the partition function Z(k) over all secondary structures of all k-point mutants, even allowing the user to specify certain positions required not to mutate and certain positions required to base-pair or remain unpaired. This technically important extension allows us to study the resilience of an RNA molecule to pointwise mutations. By computing the mutation profile of a sequence, a novel graphical representation of the mutational tendency of nucleotide positions, we analyze the deleterious nature of mutating specific nucleotide positions or groups of positions. We have successfully applied RNAmutants to investigate deleterious mutations (mutations that radically modify the secondary structure) in the Hepatitis C virus cis-acting replication element and to evaluate the evolutionary pressure applied on different regions of the HIV trans-activation response element. In particular, we show qualitative agreement between published Hepatitis C and HIV experimental mutagenesis studies and our analysis of deleterious mutations using RNAmutants. Our work also predicts other deleterious mutations, which could be verified experimentally. Finally, we provide evidence that the 3' UTR of the GB RNA virus C has been optimized to preserve evolutionarily conserved stem regions from a deleterious effect of pointwise mutations. We hope that there will be long-term potential applications of RNAmutants in de novo RNA design and drug design against RNA viruses. This work also suggests potential applications for large-scale exploration of the RNA sequence-structure network. Binary distributions are available at http://RNAmutants.csail.mit.edu/.
Thermodynamics of RNA structures by Wang–Landau sampling

PubMed Central

Lou, Feng; Clote, Peter

2010-01-01

Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures. Results: A non-Boltzmannian Monte Carlo algorithm was designed by Wang and Landau to estimate the density of states for complex systems, such as the Ising model, that exhibit a phase transition. In this article, we apply the Wang-Landau (WL) method to compute the density of states for secondary structures of a given RNA sequence, and for hybridizations of two RNA sequences. Our method is shown to be much faster than existent software, such as RNAsubopt. From density of states, we compute the partition function over all secondary structures and over all pseudoknot-free hybridizations. The advantage of the WL method is that by adding a function to evaluate the free energy of arbitary pseudoknotted structures and of arbitrary hybridizations, we can estimate thermodynamic parameters for situations known to be NP-complete. This extension to pseudoknots will be made in the sequel to this article; in contrast, the current article describes the WL algorithm applied to pseudoknot-free secondary structures and hybridizations. Availability: The WL RNA hybridization web server is under construction at http://bioinformatics.bc.edu/clotelab/. Contact: clote@bc.edu PMID:20529917
Evaluation of protein docking predictions using Hex 3.1 in CAPRI rounds 1 and 2.

PubMed

Ritchie, David W

2003-07-01

This article describes and reviews our efforts using Hex 3.1 to predict the docking modes of the seven target protein-protein complexes presented in the CAPRI (Critical Assessment of Predicted Interactions) blind docking trial. For each target, the structure of at least one of the docking partners was given in its unbound form, and several of the targets involved large multimeric structures (e.g., Lactobacillus HPr kinase, hemagglutinin, bovine rotavirus VP6). Here we describe several enhancements to our original spherical polar Fourier docking correlation algorithm. For example, a novel surface sphere smothering algorithm is introduced to generate multiple local coordinate systems around the surface of a large receptor molecule, which may be used to define a small number of initial ligand-docking orientations distributed over the receptor surface. High-resolution spherical polar docking correlations are performed over the resulting receptor surface patches, and candidate docking solutions are refined by using a novel soft molecular mechanics energy minimization procedure. Overall, this approach identified two good solutions at rank 5 or less for two of the seven CAPRI complexes. Subsequent analysis of our results shows that Hex 3.1 is able to place good solutions within a list of
DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu Xiaoying; Ho, Shirley; Trac, Hy

We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less
Homophyly/kinship hypothesis: Natural communities, and predicting in networks

NASA Astrophysics Data System (ADS)

Li, Angsheng; Li, Jiankou; Pan, Yicheng

2015-02-01

It has been a longstanding challenge to understand natural communities in real world networks. We proposed a community finding algorithm based on fitness of networks, two algorithms for prediction, accurate prediction and confirmation of keywords for papers in the citation network Arxiv HEP-TH (high energy physics theory), and the measures of internal centrality, external de-centrality, internal and external slopes to characterize the structures of communities. We implemented our algorithms on 2 citation and 5 cooperation graphs. Our experiments explored and validated a homophyly/kinship principle of real world networks. The homophyly/kinship principle includes: (1) homophyly is the natural selection in real world networks, similar to Darwin's kinship selection in nature, (2) real world networks consist of natural communities generated by the natural selection of homophyly, (3) most individuals in a natural community share a short list of common attributes, (4) natural communities have an internal centrality (or internal heterogeneity) that a natural community has a few nodes dominating most of the individuals in the community, (5) natural communities have an external de-centrality (or external homogeneity) that external links of a natural community homogeneously distributed in different communities, and (6) natural communities of a given network have typical structures determined by the internal slopes, and have typical patterns of outgoing links determined by external slopes, etc. Our homophyly/kinship principle perfectly matches Darwin's observation that animals from ants to people form social groups in which most individuals work for the common good, and that kinship could encourage altruistic behavior. Our homophyly/kinship principle is the network version of Darwinian theory, and builds a bridge between Darwinian evolution and network science.

PySeqLab: an open source Python package for sequence labeling and segmentation.

PubMed

Allam, Ahmed; Krauthammer, Michael

2017-11-01

Text and genomic data are composed of sequential tokens, such as words and nucleotides that give rise to higher order syntactic constructs. In this work, we aim at providing a comprehensive Python library implementing conditional random fields (CRFs), a class of probabilistic graphical models, for robust prediction of these constructs from sequential data. Python Sequence Labeling (PySeqLab) is an open source package for performing supervised learning in structured prediction tasks. It implements CRFs models, that is discriminative models from (i) first-order to higher-order linear-chain CRFs, and from (ii) first-order to higher-order semi-Markov CRFs (semi-CRFs). Moreover, it provides multiple learning algorithms for estimating model parameters such as (i) stochastic gradient descent (SGD) and its multiple variations, (ii) structured perceptron with multiple averaging schemes supporting exact and inexact search using 'violation-fixing' framework, (iii) search-based probabilistic online learning algorithm (SAPO) and (iv) an interface for Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited-memory BFGS algorithms. Viterbi and Viterbi A* are used for inference and decoding of sequences. Using PySeqLab, we built models (classifiers) and evaluated their performance in three different domains: (i) biomedical Natural language processing (NLP), (ii) predictive DNA sequence analysis and (iii) Human activity recognition (HAR). State-of-the-art performance comparable to machine-learning based systems was achieved in the three domains without feature engineering or the use of knowledge sources. PySeqLab is available through https://bitbucket.org/A_2/pyseqlab with tutorials and documentation. ahmed.allam@yale.edu or michael.krauthammer@yale.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Vector Sum Excited Linear Prediction (VSELP) speech coding at 4.8 kbps

NASA Technical Reports Server (NTRS)

Gerson, Ira A.; Jasiuk, Mark A.

1990-01-01

Code Excited Linear Prediction (CELP) speech coders exhibit good performance at data rates as low as 4800 bps. The major drawback to CELP type coders is their larger computational requirements. The Vector Sum Excited Linear Prediction (VSELP) speech coder utilizes a codebook with a structure which allows for a very efficient search procedure. Other advantages of the VSELP codebook structure is discussed and a detailed description of a 4.8 kbps VSELP coder is given. This coder is an improved version of the VSELP algorithm, which finished first in the NSA's evaluation of the 4.8 kbps speech coders. The coder uses a subsample resolution single tap long term predictor, a single VSELP excitation codebook, a novel gain quantizer which is robust to channel errors, and a new adaptive pre/postfilter arrangement.
Tertiary structure-based analysis of microRNA–target interactions

PubMed Central

Gan, Hin Hark; Gunsalus, Kristin C.

2013-01-01

Current computational analysis of microRNA interactions is based largely on primary and secondary structure analysis. Computationally efficient tertiary structure-based methods are needed to enable more realistic modeling of the molecular interactions underlying miRNA-mediated translational repression. We incorporate algorithms for predicting duplex RNA structures, ionic strength effects, duplex entropy and free energy, and docking of duplex–Argonaute protein complexes into a pipeline to model and predict miRNA–target duplex binding energies. To ensure modeling accuracy and computational efficiency, we use an all-atom description of RNA and a continuum description of ionic interactions using the Poisson–Boltzmann equation. Our method predicts the conformations of two constructs of Caenorhabditis elegans let-7 miRNA–target duplexes to an accuracy of ∼3.8 Å root mean square distance of their NMR structures. We also show that the computed duplex formation enthalpies, entropies, and free energies for eight miRNA–target duplexes agree with titration calorimetry data. Analysis of duplex–Argonaute docking shows that structural distortions arising from single-base-pair mismatches in the seed region influence the activity of the complex by destabilizing both duplex hybridization and its association with Argonaute. Collectively, these results demonstrate that tertiary structure-based modeling of miRNA interactions can reveal structural mechanisms not accessible with current secondary structure-based methods. PMID:23417009
Prediction of cardiovascular risk in rheumatoid arthritis: performance of original and adapted SCORE algorithms.

PubMed

Arts, E E A; Popa, C D; Den Broeder, A A; Donders, R; Sandoo, A; Toms, T; Rollefstad, S; Ikdahl, E; Semb, A G; Kitas, G D; Van Riel, P L C M; Fransen, J

2016-04-01

Predictive performance of cardiovascular disease (CVD) risk calculators appears suboptimal in rheumatoid arthritis (RA). A disease-specific CVD risk algorithm may improve CVD risk prediction in RA. The objectives of this study are to adapt the Systematic COronary Risk Evaluation (SCORE) algorithm with determinants of CVD risk in RA and to assess the accuracy of CVD risk prediction calculated with the adapted SCORE algorithm. Data from the Nijmegen early RA inception cohort were used. The primary outcome was first CVD events. The SCORE algorithm was recalibrated by reweighing included traditional CVD risk factors and adapted by adding other potential predictors of CVD. Predictive performance of the recalibrated and adapted SCORE algorithms was assessed and the adapted SCORE was externally validated. Of the 1016 included patients with RA, 103 patients experienced a CVD event. Discriminatory ability was comparable across the original, recalibrated and adapted SCORE algorithms. The Hosmer-Lemeshow test results indicated that all three algorithms provided poor model fit (p<0.05) for the Nijmegen and external validation cohort. The adapted SCORE algorithm mainly improves CVD risk estimation in non-event cases and does not show a clear advantage in reclassifying patients with RA who develop CVD (event cases) into more appropriate risk groups. This study demonstrates for the first time that adaptations of the SCORE algorithm do not provide sufficient improvement in risk prediction of future CVD in RA to serve as an appropriate alternative to the original SCORE. Risk assessment using the original SCORE algorithm may underestimate CVD risk in patients with RA. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
F-RAG: Generating Atomic Coordinates from RNA Graphs by Fragment Assembly.

PubMed

Jain, Swati; Schlick, Tamar

2017-11-24

Coarse-grained models represent attractive approaches to analyze and simulate ribonucleic acid (RNA) molecules, for example, for structure prediction and design, as they simplify the RNA structure to reduce the conformational search space. Our structure prediction protocol RAGTOP (RNA-As-Graphs Topology Prediction) represents RNA structures as tree graphs and samples graph topologies to produce candidate graphs. However, for a more detailed study and analysis, construction of atomic from coarse-grained models is required. Here we present our graph-based fragment assembly algorithm (F-RAG) to convert candidate three-dimensional (3D) tree graph models, produced by RAGTOP into atomic structures. We use our related RAG-3D utilities to partition graphs into subgraphs and search for structurally similar atomic fragments in a data set of RNA 3D structures. The fragments are edited and superimposed using common residues, full atomic models are scored using RAGTOP's knowledge-based potential, and geometries of top scoring models is optimized. To evaluate our models, we assess all-atom RMSDs and Interaction Network Fidelity (a measure of residue interactions) with respect to experimentally solved structures and compare our results to other fragment assembly programs. For a set of 50 RNA structures, we obtain atomic models with reasonable geometries and interactions, particularly good for RNAs containing junctions. Additional improvements to our protocol and databases are outlined. These results provide a good foundation for further work on RNA structure prediction and design applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
Binding pose and affinity prediction in the 2016 D3R Grand Challenge 2 using the Wilma-SIE method

NASA Astrophysics Data System (ADS)

Hogues, Hervé; Sulea, Traian; Gaudreault, Francis; Corbeil, Christopher R.; Purisima, Enrico O.

2018-01-01

The Farnesoid X receptor (FXR) exhibits significant backbone movement in response to the binding of various ligands and can be a challenge for pose prediction algorithms. As part of the D3R Grand Challenge 2, we tested Wilma-SIE, a rigid-protein docking method, on a set of 36 FXR ligands for which the crystal structures had originally been blinded. These ligands covered several classes of compounds. To overcome the rigid protein limitations of the method, we used an ensemble of publicly available structures for FXR from the PDB. The use of the ensemble allowed Wilma-SIE to predict poses with average and median RMSDs of 2.3 and 1.4 Å, respectively. It was quite clear, however, that had we used a single structure for the receptor the success rate would have been much lower. The most successful predictions were obtained on chemical classes for which one or more crystal structures of the receptor bound to a molecule of the same class was available. In the absence of a crystal structure for the class, observing a consensus binding mode for the ligands of the class using one or more receptor structures of other classes seemed to be indicative of a reasonable pose prediction. Affinity prediction proved to be more challenging with generally poor correlation with experimental IC50s (Kendall tau 0.3). Even when the 36 crystal structures were used the accuracy of the predicted affinities was not appreciably improved. A possible cause of difficulty is the internal energy strain arising from conformational differences in the receptor across complexes, which may need to be properly estimated and incorporated into the SIE scoring function.
Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008-2012.

PubMed

Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T

2017-07-01

Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
Systematic chemical-genetic and chemical-chemical interaction datasets for prediction of compound synergism

PubMed Central

Wildenhain, Jan; Spitzer, Michaela; Dolma, Sonam; Jarvik, Nick; White, Rachel; Roy, Marcia; Griffiths, Emma; Bellows, David S.; Wright, Gerard D.; Tyers, Mike

2016-01-01

The network structure of biological systems suggests that effective therapeutic intervention may require combinations of agents that act synergistically. However, a dearth of systematic chemical combination datasets have limited the development of predictive algorithms for chemical synergism. Here, we report two large datasets of linked chemical-genetic and chemical-chemical interactions in the budding yeast Saccharomyces cerevisiae. We screened 5,518 unique compounds against 242 diverse yeast gene deletion strains to generate an extended chemical-genetic matrix (CGM) of 492,126 chemical-gene interaction measurements. This CGM dataset contained 1,434 genotype-specific inhibitors, termed cryptagens. We selected 128 structurally diverse cryptagens and tested all pairwise combinations to generate a benchmark dataset of 8,128 pairwise chemical-chemical interaction tests for synergy prediction, termed the cryptagen matrix (CM). An accompanying database resource called ChemGRID was developed to enable analysis, visualisation and downloads of all data. The CGM and CM datasets will facilitate the benchmarking of computational approaches for synergy prediction, as well as chemical structure-activity relationship models for anti-fungal drug discovery. PMID:27874849
Computational Discovery of New Materials Under Pressure

NASA Astrophysics Data System (ADS)

Zurek, Eva

The pressure variable opens the door towards the synthesis of materials with unique properties, ie. superconductivity, hydrogen storage media, high-energy density and superhard materials, to name a few. Indeed, recently superconductivity has been observed below 203 K and 103 K in samples of compressed sulfur dihydride and phosphine, respectively. Under pressure elements that would not normally combine may form stable compounds, or may mix in novel proportions. As a result using our chemical intuition developed at 1 atm to theoretically predict stable phases is bound to fail. In order to enable our search for superconducting hydrogen-rich systems under pressure, we have developed XtalOpt, an open-source evolutionary algorithm for crystal structure prediction. New advances in XtalOpt that enable the prediction of unit cells with greater complexity will be described. XtalOpt has been employed to find the most stable structures of hydrides with unique stoichiometries under pressure. The electronic structure and bonding of the predicted phases has been analyzed by detailed first-principles calculations based on density functional theory. The results of our computational experiments are helping us to build chemical and physical intuition for compressed solids.
A class-based link prediction using Distance Dependent Chinese Restaurant Process

NASA Astrophysics Data System (ADS)

Andalib, Azam; Babamir, Seyed Morteza

2016-08-01

One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.
Low Frequency Predictive Skill Despite Structural Instability and Model Error

DTIC Science & Technology

2014-09-30

Majda, based on earlier theoretical work. 1. Dynamic Stochastic Superresolution of sparseley observed turbulent systems M. Branicki (Post doc...of numerical models. Here, we introduce and study a suite of general Dynamic Stochastic Superresolution (DSS) algorithms and show that, by...resolving subgridscale turbulence through Dynamic Stochastic Superresolution utilizing aliased grids is a potential breakthrough for practical online
Automated use of mutagenesis data in structure prediction.

PubMed

Nanda, Vikas; DeGrado, William F

2005-05-15

In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.
NWRA AVOSS Wake Vortex Prediction Algorithm. 3.1.1

NASA Technical Reports Server (NTRS)

Robins, R. E.; Delisi, D. P.; Hinton, David (Technical Monitor)

2002-01-01

This report provides a detailed description of the wake vortex prediction algorithm used in the Demonstration Version of NASA's Aircraft Vortex Spacing System (AVOSS). The report includes all equations used in the algorithm, an explanation of how to run the algorithm, and a discussion of how the source code for the algorithm is organized. Several appendices contain important supplementary information, including suggestions for enhancing the algorithm and results from test cases.
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

PubMed

Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

2007-12-01

Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
RaptorX server: a resource for template-based protein structure modeling.

PubMed

Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo

2014-01-01

Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
Predicting full-field dynamic strain on a three-bladed wind turbine using three dimensional point tracking and expansion techniques

NASA Astrophysics Data System (ADS)

Baqersad, Javad; Niezrecki, Christopher; Avitabile, Peter

2014-03-01

As part of a project to predict the full-field dynamic strain in rotating structures (e.g. wind turbines and helicopter blades), an experimental measurement was performed on a wind turbine attached to a 500-lb steel block and excited using a mechanical shaker. In this paper, the dynamic displacement of several optical targets mounted to a turbine placed in a semi-built-in configuration was measured by using three-dimensional point tracking. Using an expansion algorithm in conjunction with a finite element model of the blades, the measured displacements were expanded to all finite element degrees of freedom. The calculated displacements were applied to the finite element model to extract dynamic strain on the surface as well as within the interior points of the structure. To validate the technique for dynamic strain prediction, the physical strain at eight locations on the blades was measured during excitation using strain-gages. The expansion was performed by using both structural modes of an individual cantilevered blade and using modes of the entire structure (three-bladed wind turbine and the fixture) and the predicted strain was compared to the physical strain-gage measurements. The results demonstrate the ability of the technique to predict full-field dynamic strain from limited sets of measurements and can be used as a condition based monitoring tool to help provide damage prognosis of structures during operation.
Nonalcoholic fatty liver disease (NAFLD) in the Veterans Administration population: development and validation of an algorithm for NAFLD using automated data.

PubMed

Husain, N; Blais, P; Kramer, J; Kowalkowski, M; Richardson, P; El-Serag, H B; Kanwal, F

2014-10-01

In practice, nonalcoholic fatty liver disease (NAFLD) is diagnosed based on elevated liver enzymes and confirmatory liver biopsy or abdominal imaging. Neither method is feasible in identifying individuals with NAFLD in a large-scale healthcare system. To develop and validate an algorithm to identify patients with NAFLD using automated data. Using the Veterans Administration Corporate Data Warehouse, we identified patients who had persistent ALT elevation (≥2 values ≥40 IU/mL ≥6 months apart) and did not have evidence of hepatitis B, hepatitis C or excessive alcohol use. We conducted a structured chart review of 450 patients classified as NAFLD and 150 patients who were classified as non-NAFLD by the database algorithm, and subsequently refined the database algorithm. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) for the initial database definition of NAFLD were 78.4% (95% CI: 70.0-86.8%), 74.5% (95% CI: 68.1-80.9%), 64.1% (95% CI: 56.4-71.7%) and 85.6% (95% CI: 79.4-91.8%), respectively. Reclassifying patients as having NAFLD if they had two elevated ALTs that were at least 6 months apart but within 2 years of each other, increased the specificity and PPV of the algorithm to 92.4% (95% CI: 88.8-96.0%) and 80.8% (95% CI: 72.5-89.0%), respectively. However, the sensitivity and NPV decreased to 55.0% (95% CI: 46.1-63.9%) and 78.0% (95% CI: 72.1-83.8%), respectively. Predictive algorithms using automated data can be used to identify patients with NAFLD, determine prevalence of NAFLD at the system-wide level, and may help select a target population for future clinical studies in veterans with NAFLD. © 2014 John Wiley & Sons Ltd.
Structures of 38-atom gold-platinum nanoalloy clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, Yee Pin; Yoon, Tiem Leong; Lim, Thong Leng

2015-04-24

Bimetallic nanoclusters, such as gold-platinum nanoclusters, are nanomaterials promising wide range of applications. We perform a numerical study of 38-atom gold-platinum nanoalloy clusters, Au{sub n}Pt{sub 38−n} (0 ≤ n ≤ 38), to elucidate the geometrical structures of these clusters. The lowest-energy structures of these bimetallic nanoclusters at the semi-empirical level are obtained via a global-minimum search algorithm known as parallel tempering multi-canonical basin hopping plus genetic algorithm (PTMBHGA), in which empirical Gupta many-body potential is used to describe the inter-atomic interactions among the constituent atoms. The structures of gold-platinum nanoalloy clusters are predicted to be core-shell segregated nanoclusters. Gold atomsmore » are observed to preferentially occupy the surface of the clusters, while platinum atoms tend to occupy the core due to the slightly smaller atomic radius of platinum as compared to gold’s. The evolution of the geometrical structure of 38-atom Au-Pt clusters displays striking similarity with that of 38-atom Au-Cu nanoalloy clusters as reported in the literature.« less
Predicting Loss-of-Control Boundaries Toward a Piloting Aid

NASA Technical Reports Server (NTRS)

Barlow, Jonathan; Stepanyan, Vahram; Krishnakumar, Kalmanje

2012-01-01

This work presents an approach to predicting loss-of-control with the goal of providing the pilot a decision aid focused on maintaining the pilot's control action within predicted loss-of-control boundaries. The predictive architecture combines quantitative loss-of-control boundaries, a data-based predictive control boundary estimation algorithm and an adaptive prediction method to estimate Markov model parameters in real-time. The data-based loss-of-control boundary estimation algorithm estimates the boundary of a safe set of control inputs that will keep the aircraft within the loss-of-control boundaries for a specified time horizon. The adaptive prediction model generates estimates of the system Markov Parameters, which are used by the data-based loss-of-control boundary estimation algorithm. The combined algorithm is applied to a nonlinear generic transport aircraft to illustrate the features of the architecture.
Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

PubMed

Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

2018-04-01

The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.