Parsimonious Charge Deconvolution for Native Mass Spectrometry
2018-01-01
Charge deconvolution infers the mass from mass over charge (m/z) measurements in electrospray ionization mass spectra. When applied over a wide input m/z or broad target mass range, charge-deconvolution algorithms can produce artifacts, such as false masses at one-half or one-third of the correct mass. Indeed, a maximum entropy term in the objective function of MaxEnt, the most commonly used charge deconvolution algorithm, favors a deconvolved spectrum with many peaks over one with fewer peaks. Here we describe a new “parsimonious” charge deconvolution algorithm that produces fewer artifacts. The algorithm is especially well-suited to high-resolution native mass spectrometry of intact glycoproteins and protein complexes. Deconvolution of native mass spectra poses special challenges due to salt and small molecule adducts, multimers, wide mass ranges, and fewer and lower charge states. We demonstrate the performance of the new deconvolution algorithm on a range of samples. On the heavily glycosylated plasma properdin glycoprotein, the new algorithm could deconvolve monomer and dimer simultaneously and, when focused on the m/z range of the monomer, gave accurate and interpretable masses for glycoforms that had previously been analyzed manually using m/z peaks rather than deconvolved masses. On therapeutic antibodies, the new algorithm facilitated the analysis of extensions, truncations, and Fab glycosylation. The algorithm facilitates the use of native mass spectrometry for the qualitative and quantitative analysis of protein and protein assemblies. PMID:29376659
An Integrated Framework Advancing Membrane Protein Modeling and Design
Weitzner, Brian D.; Duran, Amanda M.; Tilley, Drew C.; Elazar, Assaf; Gray, Jeffrey J.
2015-01-01
Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design. PMID:26325167
GBA manager: an online tool for querying low-complexity regions in proteins.
Bandyopadhyay, Nirmalya; Kahveci, Tamer
2010-01-01
Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .
Multi-label literature classification based on the Gene Ontology graph.
Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua
2008-12-08
The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Andreeva, Antonina
2016-06-15
The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Alternating evolutionary pressure in a genetic algorithm facilitates protein model selection
Offman, Marc N; Tournier, Alexander L; Bates, Paul A
2008-01-01
Background Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection. Results In the context of an automatic modelling pipeline, we analysed each step separately, revealing several non-intuitive trends and explored a new strategy for protein conformation sampling using Genetic Algorithms (GA). We apply the concept of alternating evolutionary pressure (AEP), i.e. intermediate rounds within the GA runs where unrestrained, linear growth of the model populations is allowed. Conclusion This approach improves the overall performance of the GA by allowing models to overcome local energy barriers. AEP enabled the selection of the best models in 40% of all targets; compared to 25% for a normal GA. PMID:18673557
Rydzewski, J; Nowak, W
2016-04-12
In this work we propose an application of a nonlinear dimensionality reduction method to represent the high-dimensional configuration space of the ligand-protein dissociation process in a manner facilitating interpretation. Rugged ligand expulsion paths are mapped into 2-dimensional space. The mapping retains the main structural changes occurring during the dissociation. The topological similarity of the reduced paths may be easily studied using the Fréchet distances, and we show that this measure facilitates machine learning classification of the diffusion pathways. Further, low-dimensional configuration space allows for identification of residues active in transport during the ligand diffusion from a protein. The utility of this approach is illustrated by examination of the configuration space of cytochrome P450cam involved in expulsing camphor by means of enhanced all-atom molecular dynamics simulations. The expulsion trajectories are sampled and constructed on-the-fly during molecular dynamics simulations using the recently developed memetic algorithms [ Rydzewski, J.; Nowak, W. J. Chem. Phys. 2015 , 143 ( 12 ), 124101 ]. We show that the memetic algorithms are effective for enforcing the ligand diffusion and cavity exploration in the P450cam-camphor complex. Furthermore, we demonstrate that machine learning techniques are helpful in inspecting ligand diffusion landscapes and provide useful tools to examine structural changes accompanying rare events.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
DOE Office of Scientific and Technical Information (OSTI.GOV)
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
Spencer, Jean L; Bhatia, Vivek N; Whelan, Stephen A; Costello, Catherine E; McComb, Mark E
2013-12-01
The identification of protein post-translational modifications (PTMs) is an increasingly important component of proteomics and biomarker discovery, but very few tools exist for performing fast and easy characterization of global PTM changes and differential comparison of PTMs across groups of data obtained from liquid chromatography-tandem mass spectrometry experiments. STRAP PTM (Software Tool for Rapid Annotation of Proteins: Post-Translational Modification edition) is a program that was developed to facilitate the characterization of PTMs using spectral counting and a novel scoring algorithm to accelerate the identification of differential PTMs from complex data sets. The software facilitates multi-sample comparison by collating, scoring, and ranking PTMs and by summarizing data visually. The freely available software (beta release) installs on a PC and processes data in protXML format obtained from files parsed through the Trans-Proteomic Pipeline. The easy-to-use interface allows examination of results at protein, peptide, and PTM levels, and the overall design offers tremendous flexibility that provides proteomics insight beyond simple assignment and counting.
Synchronous versus asynchronous modeling of gene regulatory networks.
Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni
2008-09-01
In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.
Zeng, Jianyang; Zhou, Pei; Donald, Bruce Randall
2011-01-01
One bottleneck in NMR structure determination lies in the laborious and time-consuming process of side-chain resonance and NOE assignments. Compared to the well-studied backbone resonance assignment problem, automated side-chain resonance and NOE assignments are relatively less explored. Most NOE assignment algorithms require nearly complete side-chain resonance assignments from a series of through-bond experiments such as HCCH-TOCSY or HCCCONH. Unfortunately, these TOCSY experiments perform poorly on large proteins. To overcome this deficiency, we present a novel algorithm, called NASCA (NOE Assignment and Side-Chain Assignment), to automate both side-chain resonance and NOE assignments and to perform high-resolution protein structure determination in the absence of any explicit through-bond experiment to facilitate side-chain resonance assignment, such as HCCH-TOCSY. After casting the assignment problem into a Markov Random Field (MRF), NASCA extends and applies combinatorial protein design algorithms to compute optimal assignments that best interpret the NMR data. The MRF captures the contact map information of the protein derived from NOESY spectra, exploits the backbone structural information determined by RDCs, and considers all possible side-chain rotamers. The complexity of the combinatorial search is reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is employed to find a set of optimal side-chain resonance assignments that best fit the NMR data. These side-chain resonance assignments are then used to resolve the NOE assignment ambiguity and compute high-resolution protein structures. Tests on five proteins show that NASCA assigns resonances for more than 90% of side-chain protons, and achieves about 80% correct assignments. The final structures computed using the NOE distance restraints assigned by NASCA have backbone RMSD 0.8 – 1.5 Å from the reference structures determined by traditional NMR approaches. PMID:21706248
Courcelles, Mathieu; Coulombe-Huntington, Jasmin; Cossette, Émilie; Gingras, Anne-Claude; Thibault, Pierre; Tyers, Mike
2017-07-07
Protein cross-linking mass spectrometry (CL-MS) enables the sensitive detection of protein interactions and the inference of protein complex topology. The detection of chemical cross-links between protein residues can identify intra- and interprotein contact sites or provide physical constraints for molecular modeling of protein structure. Recent innovations in cross-linker design, sample preparation, mass spectrometry, and software tools have significantly improved CL-MS approaches. Although a number of algorithms now exist for the identification of cross-linked peptides from mass spectral data, a dearth of user-friendly analysis tools represent a practical bottleneck to the broad adoption of the approach. To facilitate the analysis of CL-MS data, we developed CLMSVault, a software suite designed to leverage existing CL-MS algorithms and provide intuitive and flexible tools for cross-platform data interpretation. CLMSVault stores and combines complementary information obtained from different cross-linkers and search algorithms. CLMSVault provides filtering, comparison, and visualization tools to support CL-MS analyses and includes a workflow for label-free quantification of cross-linked peptides. An embedded 3D viewer enables the visualization of quantitative data and the mapping of cross-linked sites onto PDB structural models. We demonstrate the application of CLMSVault for the analysis of a noncovalent Cdc34-ubiquitin protein complex cross-linked under different conditions. CLMSVault is open-source software (available at https://gitlab.com/courcelm/clmsvault.git ), and a live demo is available at http://democlmsvault.tyerslab.com/ .
ProtaBank: A repository for protein design and engineering data.
Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D
2018-03-25
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas
2014-01-01
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peyret, Thomas; Poulin, Patrick; Krishnan, Kannan, E-mail: kannan.krishnan@umontreal.ca
The algorithms in the literature focusing to predict tissue:blood PC (P{sub tb}) for environmental chemicals and tissue:plasma PC based on total (K{sub p}) or unbound concentration (K{sub pu}) for drugs differ in their consideration of binding to hemoglobin, plasma proteins and charged phospholipids. The objective of the present study was to develop a unified algorithm such that P{sub tb}, K{sub p} and K{sub pu} for both drugs and environmental chemicals could be predicted. The development of the unified algorithm was accomplished by integrating all mechanistic algorithms previously published to compute the PCs. Furthermore, the algorithm was structured in such amore » way as to facilitate predictions of the distribution of organic compounds at the macro (i.e. whole tissue) and micro (i.e. cells and fluids) levels. The resulting unified algorithm was applied to compute the rat P{sub tb}, K{sub p} or K{sub pu} of muscle (n = 174), liver (n = 139) and adipose tissue (n = 141) for acidic, neutral, zwitterionic and basic drugs as well as ketones, acetate esters, alcohols, aliphatic hydrocarbons, aromatic hydrocarbons and ethers. The unified algorithm reproduced adequately the values predicted previously by the published algorithms for a total of 142 drugs and chemicals. The sensitivity analysis demonstrated the relative importance of the various compound properties reflective of specific mechanistic determinants relevant to prediction of PC values of drugs and environmental chemicals. Overall, the present unified algorithm uniquely facilitates the computation of macro and micro level PCs for developing organ and cellular-level PBPK models for both chemicals and drugs.« less
A novel approach to multiple sequence alignment using hadoop data grids.
Sudha Sadasivam, G; Baktavatchalam, G
2010-01-01
Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.
Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L
2011-06-02
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka
2012-02-27
ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raymond, Amy; Lovell, Scott; Lorimer, Don
2009-12-01
With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38{alpha}), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. colimore » and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.« less
Kurkcuoglu, Zeynep; Doruker, Pemra
2016-01-01
Incorporating receptor flexibility in small ligand-protein docking still poses a challenge for proteins undergoing large conformational changes. In the absence of bound structures, sampling conformers that are accessible by apo state may facilitate docking and drug design studies. For this aim, we developed an unbiased conformational search algorithm, by integrating global modes from elastic network model, clustering and energy minimization with implicit solvation. Our dataset consists of five diverse proteins with apo to complex RMSDs 4.7–15 Å. Applying this iterative algorithm on apo structures, conformers close to the bound-state (RMSD 1.4–3.8 Å), as well as the intermediate states were generated. Dockings to a sequence of conformers consisting of a closed structure and its “parents” up to the apo were performed to compare binding poses on different states of the receptor. For two periplasmic binding proteins and biotin carboxylase that exhibit hinge-type closure of two dynamics domains, the best pose was obtained for the conformer closest to the bound structure (ligand RMSDs 1.5–2 Å). In contrast, the best pose for adenylate kinase corresponded to an intermediate state with partially closed LID domain and open NMP domain, in line with recent studies (ligand RMSD 2.9 Å). The docking of a helical peptide to calmodulin was the most challenging case due to the complexity of its 15 Å transition, for which a two-stage procedure was necessary. The technique was first applied on the extended calmodulin to generate intermediate conformers; then peptide docking and a second generation stage on the complex were performed, which in turn yielded a final peptide RMSD of 2.9 Å. Our algorithm is effective in producing conformational states based on the apo state. This study underlines the importance of such intermediate states for ligand docking to proteins undergoing large transitions. PMID:27348230
Peng, Tao; Bonamy, Ghislain M C; Glory-Afshar, Estelle; Rines, Daniel R; Chanda, Sumit K; Murphy, Robert F
2010-02-16
Many proteins or other biological macromolecules are localized to more than one subcellular structure. The fraction of a protein in different cellular compartments is often measured by colocalization with organelle-specific fluorescent markers, requiring availability of fluorescent probes for each compartment and acquisition of images for each in conjunction with the macromolecule of interest. Alternatively, tailored algorithms allow finding particular regions in images and quantifying the amount of fluorescence they contain. Unfortunately, this approach requires extensive hand-tuning of algorithms and is often cell type-dependent. Here we describe a machine-learning approach for estimating the amount of fluorescent signal in different subcellular compartments without hand tuning, requiring only the acquisition of separate training images of markers for each compartment. In testing on images of cells stained with mixtures of probes for different organelles, we achieved a 93% correlation between estimated and expected amounts of probes in each compartment. We also demonstrated that the method can be used to quantify drug-dependent protein translocations. The method enables automated and unbiased determination of the distributions of protein across cellular compartments, and will significantly improve imaging-based high-throughput assays and facilitate proteome-scale localization efforts.
Denisova, Galina F; Denisov, Dimitri A; Yeung, Jeffrey; Loeb, Mark B; Diamond, Michael S; Bramson, Jonathan L
2008-11-01
Understanding antibody function is often enhanced by knowledge of the specific binding epitope. Here, we describe a computer algorithm that permits epitope prediction based on a collection of random peptide epitopes (mimotopes) isolated by antibody affinity purification. We applied this methodology to the prediction of epitopes for five monoclonal antibodies against the West Nile virus (WNV) E protein, two of which exhibit therapeutic activity in vivo. This strategy was validated by comparison of our results with existing F(ab)-E protein crystal structures and mutational analysis by yeast surface display. We demonstrate that by combining the results of the mimotope method with our data from mutational analysis, epitopes could be predicted with greater certainty. The two methods displayed great complementarity as the mutational analysis facilitated epitope prediction when the results with the mimotope method were equivocal and the mimotope method revealed a broader number of residues within the epitope than the mutational analysis. Our results demonstrate that the combination of these two prediction strategies provides a robust platform for epitope characterization.
Web server to identify similarity of amino acid motifs to compounds (SAAMCO).
Casey, Fergal P; Davey, Norman E; Baran, Ivan; Varekova, Radka Svobodova; Shields, Denis C
2008-07-01
Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.
Serang, Oliver; MacCoss, Michael J.; Noble, William Stafford
2010-01-01
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors. PMID:20712337
Proposed structure of putative glucose channel in GLUT1 facilitative glucose transporter.
Zeng, H; Parthasarathy, R; Rampal, A L; Jung, C Y
1996-01-01
A family of structurally related intrinsic membrane proteins (facilitative glucose transporters) catalyzes the movement of glucose across the plasma membrane of animal cells. Evidence indicates that these proteins show a common structural motif where approximately 50% of the mass is embedded in lipid bilayer (transmembrane domain) in 12 alpha-helices (transmembrane helices; TMHs) and accommodates a water-filled channel for substrate passage (glucose channel) whose tertiary structure is currently unknown. Using recent advances in protein structure prediction algorithms we proposed here two three-dimensional structural models for the transmembrane glucose channel of GLUT1 glucose transporter. Our models emphasize the physical dimension and water accessibility of the channel, loop lengths between TMHs, the macrodipole orientation in four-helix bundle motif, and helix packing energy. Our models predict that five TMHs, either TMHs 3, 4, 7, 8, 11 (Model 1) or TMHs 2, 5, 11, 8, 7 (Model 2), line the channel, and the remaining TMHs surround these channel-lining TMHs. We discuss how our models are compatible with the experimental data obtained with this protein, and how they can be used in designing new biochemical and molecular biological experiments in elucidation of the structural basis of this important protein function. Images FIGURE 1 FIGURE 2 FIGURE 4 FIGURE 5 PMID:8770183
Astronomical algorithms for automated analysis of tissue protein expression in breast cancer
Ali, H R; Irwin, M; Morris, L; Dawson, S-J; Blows, F M; Provenzano, E; Mahler-Araujo, B; Pharoah, P D; Walton, N A; Brenton, J D; Caldas, C
2013-01-01
Background: High-throughput evaluation of tissue biomarkers in oncology has been greatly accelerated by the widespread use of tissue microarrays (TMAs) and immunohistochemistry. Although TMAs have the potential to facilitate protein expression profiling on a scale to rival experiments of tumour transcriptomes, the bottleneck and imprecision of manually scoring TMAs has impeded progress. Methods: We report image analysis algorithms adapted from astronomy for the precise automated analysis of IHC in all subcellular compartments. The power of this technique is demonstrated using over 2000 breast tumours and comparing quantitative automated scores against manual assessment by pathologists. Results: All continuous automated scores showed good correlation with their corresponding ordinal manual scores. For oestrogen receptor (ER), the correlation was 0.82, P<0.0001, for BCL2 0.72, P<0.0001 and for HER2 0.62, P<0.0001. Automated scores showed excellent concordance with manual scores for the unsupervised assignment of cases to ‘positive' or ‘negative' categories with agreement rates of up to 96%. Conclusion: The adaptation of astronomical algorithms coupled with their application to large annotated study cohorts, constitutes a powerful tool for the realisation of the enormous potential of digital pathology. PMID:23329232
Allmer, Jens; Kuhlgert, Sebastian; Hippler, Michael
2008-07-07
The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed. In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application. We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Pooled protein immunization for identification of cell surface antigens in Streptococcus sanguinis.
Ge, Xiuchun; Kitten, Todd; Munro, Cindy L; Conrad, Daniel H; Xu, Ping
2010-07-26
Available bacterial genomes provide opportunities for screening vaccines by reverse vaccinology. Efficient identification of surface antigens is required to reduce time and animal cost in this technology. We developed an approach to identify surface antigens rapidly in Streptococcus sanguinis, a common infective endocarditis causative species. We applied bioinformatics for antigen prediction and pooled antigens for immunization. Forty-seven surface-exposed proteins including 28 lipoproteins and 19 cell wall-anchored proteins were chosen based on computer algorithms and comparative genomic analyses. Eight proteins among these candidates and 2 other proteins were pooled together to immunize rabbits. The antiserum reacted strongly with each protein and with S. sanguinis whole cells. Affinity chromatography was used to purify the antibodies to 9 of the antigen pool components. Competitive ELISA and FACS results indicated that these 9 proteins were exposed on S. sanguinis cell surfaces. The purified antibodies had demonstrable opsonic activity. The results indicate that immunization with pooled proteins, in combination with affinity purification, and comprehensive immunological assays may facilitate cell surface antigen identification to combat infectious diseases.
Pooled Protein Immunization for Identification of Cell Surface Antigens in Streptococcus sanguinis
Ge, Xiuchun; Kitten, Todd; Munro, Cindy L.; Conrad, Daniel H.; Xu, Ping
2010-01-01
Background Available bacterial genomes provide opportunities for screening vaccines by reverse vaccinology. Efficient identification of surface antigens is required to reduce time and animal cost in this technology. We developed an approach to identify surface antigens rapidly in Streptococcus sanguinis, a common infective endocarditis causative species. Methods and Findings We applied bioinformatics for antigen prediction and pooled antigens for immunization. Forty-seven surface-exposed proteins including 28 lipoproteins and 19 cell wall-anchored proteins were chosen based on computer algorithms and comparative genomic analyses. Eight proteins among these candidates and 2 other proteins were pooled together to immunize rabbits. The antiserum reacted strongly with each protein and with S. sanguinis whole cells. Affinity chromatography was used to purify the antibodies to 9 of the antigen pool components. Competitive ELISA and FACS results indicated that these 9 proteins were exposed on S. sanguinis cell surfaces. The purified antibodies had demonstrable opsonic activity. Conclusions The results indicate that immunization with pooled proteins, in combination with affinity purification, and comprehensive immunological assays may facilitate cell surface antigen identification to combat infectious diseases. PMID:20668678
Rapid automated superposition of shapes and macromolecular models using spherical harmonics.
Konarev, Petr V; Petoukhov, Maxim V; Svergun, Dmitri I
2016-06-01
A rapid algorithm to superimpose macromolecular models in Fourier space is proposed and implemented ( SUPALM ). The method uses a normalized integrated cross-term of the scattering amplitudes as a proximity measure between two three-dimensional objects. The reciprocal-space algorithm allows for direct matching of heterogeneous objects including high- and low-resolution models represented by atomic coordinates, beads or dummy residue chains as well as electron microscopy density maps and inhomogeneous multi-phase models ( e.g. of protein-nucleic acid complexes). Using spherical harmonics for the computation of the amplitudes, the method is up to an order of magnitude faster than the real-space algorithm implemented in SUPCOMB by Kozin & Svergun [ J. Appl. Cryst. (2001 ▸), 34 , 33-41]. The utility of the new method is demonstrated in a number of test cases and compared with the results of SUPCOMB . The spherical harmonics algorithm is best suited for low-resolution shape models, e.g . those provided by solution scattering experiments, but also facilitates a rapid cross-validation against structural models obtained by other methods.
Coiled-Coil Proteins Facilitated the Functional Expansion of the Centrosome
Kuhn, Michael; Hyman, Anthony A.; Beyer, Andreas
2014-01-01
Repurposing existing proteins for new cellular functions is recognized as a main mechanism of evolutionary innovation, but its role in organelle evolution is unclear. Here, we explore the mechanisms that led to the evolution of the centrosome, an ancestral eukaryotic organelle that expanded its functional repertoire through the course of evolution. We developed a refined sequence alignment technique that is more sensitive to coiled coil proteins, which are abundant in the centrosome. For proteins with high coiled-coil content, our algorithm identified 17% more reciprocal best hits than BLAST. Analyzing 108 eukaryotic genomes, we traced the evolutionary history of centrosome proteins. In order to assess how these proteins formed the centrosome and adopted new functions, we computationally emulated evolution by iteratively removing the most recently evolved proteins from the centrosomal protein interaction network. Coiled-coil proteins that first appeared in the animal–fungi ancestor act as scaffolds and recruit ancestral eukaryotic proteins such as kinases and phosphatases to the centrosome. This process created a signaling hub that is crucial for multicellular development. Our results demonstrate how ancient proteins can be co-opted to different cellular localizations, thereby becoming involved in novel functions. PMID:24901223
Towards Inferring Protein Interactions: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Zhang, Ya; Zha, Hongyuan; Chu, Chao-Hsien; Ji, Xiang
2006-12-01
Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Lindsey, Merry L; Mayr, Manuel; Gomes, Aldrin V; Delles, Christian; Arrell, D Kent; Murphy, Anne M; Lange, Richard A; Costello, Catherine E; Jin, Yu-Fang; Laskowitz, Daniel T; Sam, Flora; Terzic, Andre; Van Eyk, Jennifer; Srinivas, Pothur R
2015-09-01
The year 2014 marked the 20th anniversary of the coining of the term proteomics. The purpose of this scientific statement is to summarize advances over this period that have catalyzed our capacity to address the experimental, translational, and clinical implications of proteomics as applied to cardiovascular health and disease and to evaluate the current status of the field. Key successes that have energized the field are delineated; opportunities for proteomics to drive basic science research, facilitate clinical translation, and establish diagnostic and therapeutic healthcare algorithms are discussed; and challenges that remain to be solved before proteomic technologies can be readily translated from scientific discoveries to meaningful advances in cardiovascular care are addressed. Proteomics is the result of disruptive technologies, namely, mass spectrometry and database searching, which drove protein analysis from 1 protein at a time to protein mixture analyses that enable large-scale analysis of proteins and facilitate paradigm shifts in biological concepts that address important clinical questions. Over the past 20 years, the field of proteomics has matured, yet it is still developing rapidly. The scope of this statement will extend beyond the reaches of a typical review article and offer guidance on the use of next-generation proteomics for future scientific discovery in the basic research laboratory and clinical settings. © 2015 American Heart Association, Inc.
A traveling salesman approach for predicting protein functions.
Johnson, Olin; Liu, Jing
2006-10-12
Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.
A traveling salesman approach for predicting protein functions
Johnson, Olin; Liu, Jing
2006-01-01
Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783
ProDaMa: an open source Python library to generate protein structure datasets.
Armano, Giuliano; Manconi, Andrea
2009-10-02
The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Residue-Specific Side-Chain Polymorphisms via Particle Belief Propagation.
Ghoraie, Laleh Soltan; Burkowski, Forbes; Li, Shuai Cheng; Zhu, Mu
2014-01-01
Protein side chains populate diverse conformational ensembles in crystals. Despite much evidence that there is widespread conformational polymorphism in protein side chains, most of the X-ray crystallography data are modeled by single conformations in the Protein Data Bank. The ability to extract or to predict these conformational polymorphisms is of crucial importance, as it facilitates deeper understanding of protein dynamics and functionality. In this paper, we describe a computational strategy capable of predicting side-chain polymorphisms. Our approach extends a particular class of algorithms for side-chain prediction by modeling the side-chain dihedral angles more appropriately as continuous rather than discrete variables. Employing a new inferential technique known as particle belief propagation, we predict residue-specific distributions that encode information about side-chain polymorphisms. Our predicted polymorphisms are in relatively close agreement with results from a state-of-the-art approach based on X-ray crystallography data, which characterizes the conformational polymorphisms of side chains using electron density information, and has successfully discovered previously unmodeled conformations.
pmx Webserver: A User Friendly Interface for Alchemistry.
Gapsys, Vytautas; de Groot, Bert L
2017-02-27
With the increase of available computational power and improvements in simulation algorithms, alchemical molecular dynamics based free energy calculations have developed into routine usage. To further facilitate the usability of alchemical methods for amino acid mutations, we have developed a web based infrastructure for obtaining hybrid protein structures and topologies. The presented webserver allows amino acid mutation selection in five contemporary molecular mechanics force fields. In addition, a complete mutation scan with a user defined amino acid is supported. The output generated by the webserver is directly compatible with the Gromacs molecular dynamics engine and can be used with any of the alchemical free energy calculation setup. Furthermore, we present a database of input files and precalculated free energy differences for tripeptides approximating a disordered state of a protein, of particular use for protein stability studies. Finally, the usage of the webserver and its output is exemplified by performing an alanine scan and investigating thermodynamic stability of the Trp cage mini protein. The webserver is accessible at http://pmx.mpibpc.mpg.de.
Gene Composer: database software for protein construct design, codon engineering, and gene synthesis
Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance
2009-01-01
Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies. PMID:19383142
Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance
2009-04-21
To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies.
Rose, Annkatrin; Manikantan, Sankaraganesh; Schraegle, Shannon J.; Maloy, Michael A.; Stahlberg, Eric A.; Meier, Iris
2004-01-01
Increasing evidence demonstrates the importance of long coiled-coil proteins for the spatial organization of cellular processes. Although several protein classes with long coiled-coil domains have been studied in animals and yeast, our knowledge about plant long coiled-coil proteins is very limited. The repeat nature of the coiled-coil sequence motif often prevents the simple identification of homologs of animal coiled-coil proteins by generic sequence similarity searches. As a consequence, counterparts of many animal proteins with long coiled-coil domains, like lamins, golgins, or microtubule organization center components, have not been identified yet in plants. Here, all Arabidopsis proteins predicted to contain long stretches of coiled-coil domains were identified by applying the algorithm MultiCoil to a genome-wide screen. A searchable protein database, ARABI-COIL (http://www.coiled-coil.org/arabidopsis), was established that integrates information on number, size, and position of predicted coiled-coil domains with subcellular localization signals, transmembrane domains, and available functional annotations. ARABI-COIL serves as a tool to sort and browse Arabidopsis long coiled-coil proteins to facilitate the identification and selection of candidate proteins of potential interest for specific research areas. Using the database, candidate proteins were identified for Arabidopsis membrane-bound, nuclear, and organellar long coiled-coil proteins. PMID:15020757
PLIP: fully automated protein-ligand interaction profiler.
Salentin, Sebastian; Schreiber, Sven; Haupt, V Joachim; Adasme, Melissa F; Schroeder, Michael
2015-07-01
The characterization of interactions in protein-ligand complexes is essential for research in structural bioinformatics, drug discovery and biology. However, comprehensive tools are not freely available to the research community. Here, we present the protein-ligand interaction profiler (PLIP), a novel web service for fully automated detection and visualization of relevant non-covalent protein-ligand contacts in 3D structures, freely available at projects.biotec.tu-dresden.de/plip-web. The input is either a Protein Data Bank structure, a protein or ligand name, or a custom protein-ligand complex (e.g. from docking). In contrast to other tools, the rule-based PLIP algorithm does not require any structure preparation. It returns a list of detected interactions on single atom level, covering seven interaction types (hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges and halogen bonds). PLIP stands out by offering publication-ready images, PyMOL session files to generate custom images and parsable result files to facilitate successive data processing. The full python source code is available for download on the website. PLIP's command-line mode allows for high-throughput interaction profiling. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
STAR Algorithm Integration Team - Facilitating operational algorithm development
NASA Astrophysics Data System (ADS)
Mikles, V. J.
2015-12-01
The NOAA/NESDIS Center for Satellite Research and Applications (STAR) provides technical support of the Joint Polar Satellite System (JPSS) algorithm development and integration tasks. Utilizing data from the S-NPP satellite, JPSS generates over thirty Environmental Data Records (EDRs) and Intermediate Products (IPs) spanning atmospheric, ocean, cryosphere, and land weather disciplines. The Algorithm Integration Team (AIT) brings technical expertise and support to product algorithms, specifically in testing and validating science algorithms in a pre-operational environment. The AIT verifies that new and updated algorithms function in the development environment, enforces established software development standards, and ensures that delivered packages are functional and complete. AIT facilitates the development of new JPSS-1 algorithms by implementing a review approach based on the Enterprise Product Lifecycle (EPL) process. Building on relationships established during the S-NPP algorithm development process and coordinating directly with science algorithm developers, the AIT has implemented structured reviews with self-contained document suites. The process has supported algorithm improvements for products such as ozone, active fire, vegetation index, and temperature and moisture profiles.
Towards human-computer synergetic analysis of large-scale biological data.
Singh, Rahul; Yang, Hui; Dalziel, Ben; Asarnow, Daniel; Murad, William; Foote, David; Gormley, Matthew; Stillman, Jonathan; Fisher, Susan
2013-01-01
Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself. In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Experiential Microarray Analysis System) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (Protein Space Explorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration. The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/.
Towards human-computer synergetic analysis of large-scale biological data
2013-01-01
Background Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself. Results In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Experiential Microarray Analysis System) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (Protein Space Explorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration. Conclusions The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/. PMID:24267485
Implementation of a parallel protein structure alignment service on cloud.
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Implementation of a Parallel Protein Structure Alignment Service on Cloud
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842
Improved hybrid optimization algorithm for 3D protein structure prediction.
Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang
2014-07-01
A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
Rosetta:MSF: a modular framework for multi-state computational protein design.
Löffler, Patrick; Schmitz, Samuel; Hupfeld, Enrico; Sterner, Reinhard; Merkl, Rainer
2017-06-01
Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta's protocols optimize sequences based on a single conformation (i. e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta's single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (βα)8-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design.
Rosetta:MSF: a modular framework for multi-state computational protein design
Hupfeld, Enrico; Sterner, Reinhard
2017-01-01
Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta’s protocols optimize sequences based on a single conformation (i. e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta’s single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (βα)8-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design. PMID:28604768
An efficient algorithm for pairwise local alignment of protein interaction networks
Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; ...
2015-04-01
Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Computmore » Biol 13(2): 182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. In conclusion, the protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.« less
Protein Engineering Approaches in the Post-Genomic Era.
Singh, Raushan K; Lee, Jung-Kul; Selvaraj, Chandrabose; Singh, Ranjitha; Li, Jinglin; Kim, Sang-Yong; Kalia, Vipin C
2018-01-01
Proteins are one of the most multifaceted macromolecules in living systems. Proteins have evolved to function under physiological conditions and, therefore, are not usually tolerant of harsh experimental and environmental conditions. The growing use of proteins in industrial processes as a greener alternative to chemical catalysts often demands constant innovation to improve their performance. Protein engineering aims to design new proteins or modify the sequence of a protein to create proteins with new or desirable functions. With the emergence of structural and functional genomics, protein engineering has been invigorated in the post-genomic era. The three-dimensional structures of proteins with known functions facilitate protein engineering approaches to design variants with desired properties. There are three major approaches of protein engineering research, namely, directed evolution, rational design, and de novo design. Rational design is an effective method of protein engineering when the threedimensional structure and mechanism of the protein is well known. In contrast, directed evolution does not require extensive information and a three-dimensional structure of the protein of interest. Instead, it involves random mutagenesis and selection to screen enzymes with desired properties. De novo design uses computational protein design algorithms to tailor synthetic proteins by using the three-dimensional structures of natural proteins and their folding rules. The present review highlights and summarizes recent protein engineering approaches, and their challenges and limitations in the post-genomic era. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
MimoSA: a system for minimotif annotation
2010-01-01
Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context. PMID:20565705
NASA Astrophysics Data System (ADS)
Diaz, K. S.; Kim, E. H.; Jones, R. M.; de Leon, K. C.; Woodcroft, B. J.; Tyson, G. W.; Rich, V. I.
2014-12-01
The growing field of metaproteomics links microbial communities to their expressed functions by using mass spectrometry methods to characterize community proteins. Comparison of mass spectrometry protein search algorithms and their biases is crucial for maximizing the quality and amount of protein identifications in mass spectral data. Available algorithms employ different approaches when mapping mass spectra to peptides against a database. We compared mass spectra from four microbial proteomes derived from high-organic content soils searched with two search algorithms: 1) Sequest HT as packaged within Proteome Discoverer (v.1.4) and 2) X!Tandem as packaged in TransProteomicPipeline (v.4.7.1). Searches used matched metagenomes, and results were filtered to allow identification of high probability proteins. There was little overlap in proteins identified by both algorithms, on average just ~24% of the total. However, when adjusted for spectral abundance, the overlap improved to ~70%. Proteome Discoverer generally outperformed X!Tandem, identifying an average of 12.5% more proteins than X!Tandem, with X!Tandem identifying more proteins only in the first two proteomes. For spectrally-adjusted results, the algorithms were similar, with X!Tandem marginally outperforming Proteome Discoverer by an average of ~4%. We then assessed differences in heat shock proteins (HSP) identification by the two algorithms by BLASTing identified proteins against the Heat Shock Protein Information Resource, because HSP hits typically account for the majority signal in proteomes, due to extraction protocols. Total HSP identifications for each of the 4 proteomes were approximately ~15%, ~11%, ~17%, and ~19%, with ~14% for total HSPs with redundancies removed. Of the ~15% average of proteins from the 4 proteomes identified as HSPs, ~10% of proteins and spectra were identified by both algorithms. On average, Proteome Discoverer identified ~9% more HSPs than X!Tandem.
Identifying protein complexes based on brainstorming strategy.
Shen, Xianjun; Zhou, Jin; Yi, Li; Hu, Xiaohua; He, Tingting; Yang, Jincai
2016-11-01
Protein complexes comprising of interacting proteins in protein-protein interaction network (PPI network) play a central role in driving biological processes within cells. Recently, more and more swarm intelligence based algorithms to detect protein complexes have been emerging, which have become the research hotspot in proteomics field. In this paper, we propose a novel algorithm for identifying protein complexes based on brainstorming strategy (IPC-BSS), which is integrated into the main idea of swarm intelligence optimization and the improved K-means algorithm. Distance between the nodes in PPI network is defined by combining the network topology and gene ontology (GO) information. Inspired by human brainstorming process, IPC-BSS algorithm firstly selects the clustering center nodes, and then they are separately consolidated with the other nodes with short distance to form initial clusters. Finally, we put forward two ways of updating the initial clusters to search optimal results. Experimental results show that our IPC-BSS algorithm outperforms the other classic algorithms on yeast and human PPI networks, and it obtains many predicted protein complexes with biological significance. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767
A General, Adaptive, Roadmap-Based Algorithm for Protein Motion Computation.
Molloy, Kevin; Shehu, Amarda
2016-03-01
Precious information on protein function can be extracted from a detailed characterization of protein equilibrium dynamics. This remains elusive in wet and dry laboratories, as function-modulating transitions of a protein between functionally-relevant, thermodynamically-stable and meta-stable structural states often span disparate time scales. In this paper we propose a novel, robotics-inspired algorithm that circumvents time-scale challenges by drawing analogies between protein motion and robot motion. The algorithm adapts the popular roadmap-based framework in robot motion computation to handle the more complex protein conformation space and its underlying rugged energy surface. Given known structures representing stable and meta-stable states of a protein, the algorithm yields a time- and energy-prioritized list of transition paths between the structures, with each path represented as a series of conformations. The algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. Promising results are presented on a variety of proteins that demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
SHARPEN-systematic hierarchical algorithms for rotamers and proteins on an extended network.
Loksha, Ilya V; Maiolo, James R; Hong, Cheng W; Ng, Albert; Snow, Christopher D
2009-04-30
Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. (c) 2009 Wiley Periodicals, Inc.
Accelerated Profile HMM Searches
Eddy, Sean R.
2011-01-01
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches. PMID:22039361
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
3D Protein structure prediction with genetic tabu search algorithm
2010-01-01
Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Hume, Maxwell A; Barrera, Luis A; Gisselbrecht, Stephen S; Bulyk, Martha L
2015-01-01
The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers'). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
2014-01-01
Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026
BayesMotif: de novo protein sorting motif discovery from impure datasets.
Hu, Jianjun; Zhang, Fan
2010-01-18
Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Facilitated diffusion in chromatin lattices: mechanistic diversity and regulatory potential.
Kampmann, Martin
2005-08-01
The interaction between a protein and a specific DNA site is the molecular basis for vital processes in all organisms. Location of the DNA target site by the protein commonly involves facilitated diffusion. Mechanisms of facilitated diffusion vary among proteins; they include one- and two-dimensional sliding along DNA, direct transfer between uncorrelated sites, as well as combinations of these mechanisms. Facilitated diffusion has almost exclusively been studied in vitro. This review discusses facilitated diffusion in the context of the living cell and proposes a theoretical model for facilitated diffusion in chromatin lattices. Chromatin structure differentially affects proteins in different modes of diffusion. The interplay of facilitated diffusion and chromatin structure can determine the rate of protein association with the target site, the frequency of association-dissociation events at the target site, and, under particular conditions, the occupancy of the target site. Facilitated diffusion is required in vivo for efficient DNA repair and bacteriophage restriction and has potential roles in fine-tuning gene regulatory networks and kinetically compartmentalizing the eukaryotic nucleus.
Le, Duc-Hau
2015-01-01
Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.
A general algorithm using finite element method for aerodynamic configurations at low speeds
NASA Technical Reports Server (NTRS)
Balasubramanian, R.
1975-01-01
A finite element algorithm for numerical simulation of two-dimensional, incompressible, viscous flows was developed. The Navier-Stokes equations are suitably modelled to facilitate direct solution for the essential flow parameters. A leap-frog time differencing and Galerkin minimization of these model equations yields the finite element algorithm. The finite elements are triangular with bicubic shape functions approximating the solution space. The finite element matrices are unsymmetrically banded to facilitate savings in storage. An unsymmetric L-U decomposition is performed on the finite element matrices to obtain the solution for the boundary value problem.
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
Huang, Chien-Hung; Peng, Huai-Shun; Ng, Ka-Lok
2015-01-01
Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.
2015-01-01
Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. PMID:25866773
An ensemble framework for clustering protein-protein interaction networks.
Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan
2007-07-01
Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.
Chaudhary, Nitika; Sandhu, Padmani; Ahmed, Mushtaq; Akhter, Yusuf
2017-02-01
Trichothecenes are the sesquiterpenes secreted by Trichoderma spp. residing in the rhizosphere. These compounds have been reported to act as plant growth promoters and bio-control agents. The structural knowledge for the transporter proteins of their efflux remained limited. In this study, three-dimensional structure of Thmfs1 protein, a trichothecene transporter from Trichoderma harzianum, was homology modelled and further Molecular Dynamics (MD) simulations were used to decipher its mechanism. Fourteen transmembrane helices of Thmfs1 protein are observed contributing to an inward-open conformation. The transport channel and ligand binding sites in Thmfs1 are identified based on heuristic, iterative algorithm and structural alignment with homologous proteins. MD simulations were performed to reveal the differential structural behaviour occurring in the ligand free and ligand bound forms. We found that two discrete trichothecene binding sites are located on either side of the central transport tunnel running from the cytoplasmic side to the extracellular side across the Thmfs1 protein. Detailed analysis of the MD trajectories showed an alternative access mechanism between N and C-terminal domains contributing to its function. These results also demonstrate that the transport of trichodermin occurs via hopping mechanism in which the substrate molecule jumps from one binding site to another lining the transport tunnel. Copyright © 2016 Elsevier B.V. All rights reserved.
Mian, Shahid; Ball, Graham; Hornbuckle, Jo; Holding, Finn; Carmichael, James; Ellis, Ian; Ali, Selman; Li, Geng; McArdle, Stephanie; Creaser, Colin; Rees, Robert
2003-09-01
An ability to predict the likelihood of cellular response towards particular chemotherapeutic agents based upon protein expression patterns could facilitate the identification of biological molecules with previously undefined roles in the process of chemoresistance/chemosensitivity, and if robust enough these patterns might also be exploited towards the development of novel predictive assays. To ascertain whether proteomic based molecular profiling in conjunction with artificial neural network (ANN) algorithms could be applied towards the specific recognition of phenotypic patterns between either control or drug treated and chemosensitive or chemoresistant cellular populations, a combined approach involving MALDI-TOF matrix-assisted laser desorption/ionization-time of flight mass spectrometry, Ciphergen protein chip technology and ANN algorithms have been applied to specifically identify proteomic 'fingerprints' indicative of treatment regimen for chemosensitive (MCF-7, T47D) and chemoresistant (MCF-7/ADR) breast cancer cell lines following exposure to Doxorubicin or Paclitaxel. The results indicate that proteomic patterns can be identified by ANN algorithms to correctly assign 'class' for treatment regimen (e.g. control/drug treated or chemosensitive/chemoresistant) with a high degree of accuracy using boot-strap statistical validation techniques and that biomarker ion patterns indicative of response/non-response phenotypes are associated with MCF-7 and MCF-7/ADR cells exposed to Doxorubicin. We have also examined the predictive capability of this approach towards MCF-7 and T47D cells to ascertain whether prediction could be made based upon treatment regimen irrespective of cell lineage. Models were identified that could correctly assign class (control or Paclitaxel treatment) for 35/38 samples of an independent dataset. A similar level of predictive capability was also found (> 92%; n = 28) when proteomic patterns derived from the drug resistant cell line MCF-7/ADR were compared against those derived from MCF-7 and T47D as a model system of drug resistant and drug sensitive phenotypes. This approach might offer a potential methodology for predicting the biological behaviour of cancer cells towards particular chemotherapeutics and through protein isolation and sequence identification could result in the identification of biological molecules associated with chemosensitive/chemoresistance tumour phenotypes.
A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.
Edvardsson, Sverker; Gardner, Paul P; Poole, Anthony M; Hendy, Michael D; Penny, David; Moulton, Vincent
2003-05-01
Noncoding RNA genes produce functional RNA molecules rather than coding for proteins. One such family is the H/ACA snoRNAs. Unlike the related C/D snoRNAs these have resisted automated detection to date. We develop an algorithm to screen the yeast genome for novel H/ACA snoRNAs. To achieve this, we introduce some new methods for facilitating the search for noncoding RNAs in genomic sequences which are based on properties of predicted minimum free-energy (MFE) secondary structures. The algorithm has been implemented and can be generalized to enable screening of other eukaryote genomes. We find that use of primary sequence alone is insufficient for identifying novel H/ACA snoRNAs. Only the use of secondary structure filters reduces the number of candidates to a manageable size. From genomic context, we identify three strong H/ACA snoRNA candidates. These together with a further 47 candidates obtained by our analysis are being experimentally screened.
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.
Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B
2016-01-01
Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains
Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan
2016-01-01
Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230
Protein docking prediction using predicted protein-protein interface.
Li, Bin; Kihara, Daisuke
2012-01-10
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Congdon, Heather Brennan; Eldridge, Barbara Hoffman; Truong, Hoai-An
2013-11-01
Development and implementation of an interprofessional navigator-facilitated care coordination algorithm (NAVCOM) for low-income, uninsured patients with uncontrolled diabetes at a safety-net clinic resulted in improvement of disease control as evidenced by improvement in hemoglobin A1C. This report describes the process and lessons learned from the development and implementation of NAVCOM and patient success stories.
Yang, Chunxiao; Li, Hui; Pan, Huipeng; Ma, Yabin; Zhang, Deyong; Liu, Yong; Zhang, Zhanhong; Zheng, Changying; Chu, Dong
2015-01-01
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for measuring and evaluating gene expression during variable biological processes. To facilitate gene expression studies, normalization of genes of interest relative to stable reference genes is crucial. The western flower thrips Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), the main vector of tomato spotted wilt virus (TSWV), is a destructive invasive species. In this study, the expression profiles of 11 candidate reference genes from nonviruliferous and viruliferous F. occidentalis were investigated. Five distinct algorithms, geNorm, NormFinder, BestKeeper, the ΔCt method, and RefFinder, were used to determine the performance of these genes. geNorm, NormFinder, BestKeeper, and RefFinder identified heat shock protein 70 (HSP70), heat shock protein 60 (HSP60), elongation factor 1 α, and ribosomal protein l32 (RPL32) as the most stable reference genes, and the ΔCt method identified HSP60, HSP70, RPL32, and heat shock protein 90 as the most stable reference genes. Additionally, two reference genes were sufficient for reliable normalization in nonviruliferous and viruliferous F. occidentalis. This work provides a foundation for investigating the molecular mechanisms of TSWV and F. occidentalis interactions.
Present and future of membrane protein structure determination by electron crystallography.
Ubarretxena-Belandia, Iban; Stokes, David L
2010-01-01
Membrane proteins are critical to cell physiology, playing roles in signaling, trafficking, transport, adhesion, and recognition. Despite their relative abundance in the proteome and their prevalence as targets of therapeutic drugs, structural information about membrane proteins is in short supply. This chapter describes the use of electron crystallography as a tool for determining membrane protein structures. Electron crystallography offers distinct advantages relative to the alternatives of X-ray crystallography and NMR spectroscopy. Namely, membrane proteins are placed in their native membranous environment, which is likely to favor a native conformation and allow changes in conformation in response to physiological ligands. Nevertheless, there are significant logistical challenges in finding appropriate conditions for inducing membrane proteins to form two-dimensional arrays within the membrane and in using electron cryo-microscopy to collect the data required for structure determination. A number of developments are described for high-throughput screening of crystallization trials and for automated imaging of crystals with the electron microscope. These tools are critical for exploring the necessary range of factors governing the crystallization process. There have also been recent software developments to facilitate the process of structure determination. However, further innovations in the algorithms used for processing images and electron diffraction are necessary to improve throughput and to make electron crystallography truly viable as a method for determining atomic structures of membrane proteins. Copyright © 2010 Elsevier Inc. All rights reserved.
Present and future of membrane protein structure determination by electron crystallography
Ubarretxena-Belandia, Iban; Stokes, David L.
2011-01-01
Membrane proteins are critical to cell physiology, playing roles in signaling, trafficking, transport, adhesion, and recognition. Despite their relative abundance in the proteome and their prevalence as targets of therapeutic drugs, structural information about membrane proteins is in short supply. This review describes the use of electron crystallography as a tool for determining membrane protein structures. Electron crystallography offers distinct advantages relative to the alternatives of X-ray crystallography and NMR spectroscopy. Namely, membrane proteins are placed in their native membranous environment, which is likely to favor a native conformation and allow changes in conformation in response to physiological ligands. Nevertheless, there are significant logistical challenges in finding appropriate conditions for inducing membrane proteins to form two-dimensional arrays within the membrane and in using electron cryo-microscopy to collect the data required for structure determination. A number of developments are described for high-throughput screening of crystallization trials and for automated imaging of crystals with the electron microscope. These tools are critical for exploring the necessary range of factors governing the crystallization process. There have also been recent software developments to facilitate the process of structure determination. However, further innovations in the algorithms used for processing images and electron diffraction are necessary to improve throughput and to make electron crystallography truly viable as a method for determining atomic structures of membrane proteins. PMID:21115172
Biclustering Protein Complex Interactions with a Biclique FindingAlgorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Chris; Zhang, Anne Ya; Holbrook, Stephen
2006-12-01
Biclustering has many applications in text mining, web clickstream mining, and bioinformatics. When data entries are binary, the tightest biclusters become bicliques. We propose a flexible and highly efficient algorithm to compute bicliques. We first generalize the Motzkin-Straus formalism for computing the maximal clique from L{sub 1} constraint to L{sub p} constraint, which enables us to provide a generalized Motzkin-Straus formalism for computing maximal-edge bicliques. By adjusting parameters, the algorithm can favor biclusters with more rows less columns, or vice verse, thus increasing the flexibility of the targeted biclusters. We then propose an algorithm to solve the generalized Motzkin-Straus optimizationmore » problem. The algorithm is provably convergent and has a computational complexity of O(|E|) where |E| is the number of edges. It relies on a matrix vector multiplication and runs efficiently on most current computer architectures. Using this algorithm, we bicluster the yeast protein complex interaction network. We find that biclustering protein complexes at the protein level does not clearly reflect the functional linkage among protein complexes in many cases, while biclustering at protein domain level can reveal many underlying linkages. We show several new biologically significant results.« less
Smelter, Andrey; Rouchka, Eric C; Moseley, Hunter N B
2017-08-01
Peak lists derived from nuclear magnetic resonance (NMR) spectra are commonly used as input data for a variety of computer assisted and automated analyses. These include automated protein resonance assignment and protein structure calculation software tools. Prior to these analyses, peak lists must be aligned to each other and sets of related peaks must be grouped based on common chemical shift dimensions. Even when programs can perform peak grouping, they require the user to provide uniform match tolerances or use default values. However, peak grouping is further complicated by multiple sources of variance in peak position limiting the effectiveness of grouping methods that utilize uniform match tolerances. In addition, no method currently exists for deriving peak positional variances from single peak lists for grouping peaks into spin systems, i.e. spin system grouping within a single peak list. Therefore, we developed a complementary pair of peak list registration analysis and spin system grouping algorithms designed to overcome these limitations. We have implemented these algorithms into an approach that can identify multiple dimension-specific positional variances that exist in a single peak list and group peaks from a single peak list into spin systems. The resulting software tools generate a variety of useful statistics on both a single peak list and pairwise peak list alignment, especially for quality assessment of peak list datasets. We used a range of low and high quality experimental solution NMR and solid-state NMR peak lists to assess performance of our registration analysis and grouping algorithms. Analyses show that an algorithm using a single iteration and uniform match tolerances approach is only able to recover from 50 to 80% of the spin systems due to the presence of multiple sources of variance. Our algorithm recovers additional spin systems by reevaluating match tolerances in multiple iterations. To facilitate evaluation of the algorithms, we developed a peak list simulator within our nmrstarlib package that generates user-defined assigned peak lists from a given BMRB entry or database of entries. In addition, over 100,000 simulated peak lists with one or two sources of variance were generated to evaluate the performance and robustness of these new registration analysis and peak grouping algorithms.
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
A discrete search algorithm for finding the structure of protein backbones and side chains.
Sallaume, Silas; Martins, Simone de Lima; Ochi, Luiz Satoru; Da Silva, Warley Gramacho; Lavor, Carlile; Liberti, Leo
2013-01-01
Some information about protein structure can be obtained by using Nuclear Magnetic Resonance (NMR) techniques, but they provide only a sparse set of distances between atoms in a protein. The Molecular Distance Geometry Problem (MDGP) consists in determining the three-dimensional structure of a molecule using a set of known distances between some atoms. Recently, a Branch and Prune (BP) algorithm was proposed to calculate the backbone of a protein, based on a discrete formulation for the MDGP. We present an extension of the BP algorithm that can calculate not only the protein backbone, but the whole three-dimensional structure of proteins.
A Computational Algorithm for Functional Clustering of Proteome Dynamics During Development
Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling
2014-01-01
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
P-Finder: Reconstruction of Signaling Networks from Protein-Protein Interactions and GO Annotations.
Young-Rae Cho; Yanan Xin; Speegle, Greg
2015-01-01
Because most complex genetic diseases are caused by defects of cell signaling, illuminating a signaling cascade is essential for understanding their mechanisms. We present three novel computational algorithms to reconstruct signaling networks between a starting protein and an ending protein using genome-wide protein-protein interaction (PPI) networks and gene ontology (GO) annotation data. A signaling network is represented as a directed acyclic graph in a merged form of multiple linear pathways. An advanced semantic similarity metric is applied for weighting PPIs as the preprocessing of all three methods. The first algorithm repeatedly extends the list of nodes based on path frequency towards an ending protein. The second algorithm repeatedly appends edges based on the occurrence of network motifs which indicate the link patterns more frequently appearing in a PPI network than in a random graph. The last algorithm uses the information propagation technique which iteratively updates edge orientations based on the path strength and merges the selected directed edges. Our experimental results demonstrate that the proposed algorithms achieve higher accuracy than previous methods when they are tested on well-studied pathways of S. cerevisiae. Furthermore, we introduce an interactive web application tool, called P-Finder, to visualize reconstructed signaling networks.
Rong, Y; Padron, A V; Hagerty, K J; Nelson, N; Chi, S; Keyhani, N O; Katz, J; Datta, S P A; Gomes, C; McLamore, E S
2018-04-30
Impedimetric biosensors for measuring small molecules based on weak/transient interactions between bioreceptors and target analytes are a challenge for detection electronics, particularly in field studies or in the analysis of complex matrices. Protein-ligand binding sensors have enormous potential for biosensing, but achieving accuracy in complex solutions is a major challenge. There is a need for simple post hoc analytical tools that are not computationally expensive, yet provide near real time feedback on data derived from impedance spectra. Here, we show the use of a simple, open source support vector machine learning algorithm for analyzing impedimetric data in lieu of using equivalent circuit analysis. We demonstrate two different protein-based biosensors to show that the tool can be used for various applications. We conclude with a mobile phone-based demonstration focused on the measurement of acetone, an important biomarker related to the onset of diabetic ketoacidosis. In all conditions tested, the open source classifier was capable of performing as well as, or better, than the equivalent circuit analysis for characterizing weak/transient interactions between a model ligand (acetone) and a small chemosensory protein derived from the tsetse fly. In addition, the tool has a low computational requirement, facilitating use for mobile acquisition systems such as mobile phones. The protocol is deployed through Jupyter notebook (an open source computing environment available for mobile phone, tablet or computer use) and the code was written in Python. For each of the applications, we provide step-by-step instructions in English, Spanish, Mandarin and Portuguese to facilitate widespread use. All codes were based on scikit-learn, an open source software machine learning library in the Python language, and were processed in Jupyter notebook, an open-source web application for Python. The tool can easily be integrated with the mobile biosensor equipment for rapid detection, facilitating use by a broad range of impedimetric biosensor users. This post hoc analysis tool can serve as a launchpad for the convergence of nanobiosensors in planetary health monitoring applications based on mobile phone hardware.
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery
Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng
2011-01-01
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677
2011-01-01
Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu
2016-10-01
Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.
Maximova, Tatiana; Plaku, Erion; Shehu, Amarda
2016-07-07
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
PROPER: global protein interaction network alignment through percolation matching.
Kazemi, Ehsan; Hassani, Hamed; Grossglauser, Matthias; Pezeshgi Modarres, Hassan
2016-12-12
The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PPI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch .
[Algorithms for treatment of complex hand injuries].
Pillukat, T; Prommersberger, K-J
2011-07-01
The primary treatment strongly influences the course and prognosis of hand injuries. Complex injuries which compromise functional recovery are especially challenging. Despite an apparently unlimited number of injury patterns it is possible to develop strategies which facilitate a standardized approach to operative treatment. In this situation algorithms can be important guidelines for a rational approach. The following algorithms have been proven in the treatment of complex injuries of the hand by our own experience. They were modified according to the current literature and refer to prehospital care, emergency room management, basic strategy in general and reconstruction of bone and joints, vessels, nerves, tendons and soft tissue coverage in detail. Algorithms facilitate the treatment of severe hand injuries. Applying simple yes/no decisions complex injury patterns are split into distinct partial problems which can be managed step by step.
NASA Astrophysics Data System (ADS)
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L.; Hamprecht, Fred A.; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L; Hamprecht, Fred A; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
OSPREY: protein design with ensembles, flexibility, and provable algorithms.
Gainza, Pablo; Roberts, Kyle E; Georgiev, Ivelin; Lilien, Ryan H; Keedy, Daniel A; Chen, Cheng-Yu; Reza, Faisal; Anderson, Amy C; Richardson, David C; Richardson, Jane S; Donald, Bruce R
2013-01-01
We have developed a suite of protein redesign algorithms that improves realistic in silico modeling of proteins. These algorithms are based on three characteristics that make them unique: (1) improved flexibility of the protein backbone, protein side-chains, and ligand to accurately capture the conformational changes that are induced by mutations to the protein sequence; (2) modeling of proteins and ligands as ensembles of low-energy structures to better approximate binding affinity; and (3) a globally optimal protein design search, guaranteeing that the computational predictions are optimal with respect to the input model. Here, we illustrate the importance of these three characteristics. We then describe OSPREY, a protein redesign suite that implements our protein design algorithms. OSPREY has been used prospectively, with experimental validation, in several biomedically relevant settings. We show in detail how OSPREY has been used to predict resistance mutations and explain why improved flexibility, ensembles, and provability are essential for this application. OSPREY is free and open source under a Lesser GPL license. The latest version is OSPREY 2.0. The program, user manual, and source code are available at www.cs.duke.edu/donaldlab/software.php. osprey@cs.duke.edu. Copyright © 2013 Elsevier Inc. All rights reserved.
Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu
2017-09-07
The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.
TAMEE: data management and analysis for tissue microarrays.
Thallinger, Gerhard G; Baumgartner, Kerstin; Pirklbauer, Martin; Uray, Martina; Pauritsch, Elke; Mehes, Gabor; Buck, Charles R; Zatloukal, Kurt; Trajanoski, Zlatko
2007-03-07
With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceability and reproducibility of experiments and provision of results in a timely and reliable fashion. Robust and scalable applications have to be utilized, which allow secure data access, manipulation and evaluation for researchers from different laboratories. TAMEE (Tissue Array Management and Evaluation Environment) is a web-based database application for the management and analysis of data resulting from the production and application of TMAs. It facilitates storage of production and experimental parameters, of images generated throughout the TMA workflow, and of results from core evaluation. Database content consistency is achieved using structured classifications of parameters. This allows the extraction of high quality results for subsequent biologically-relevant data analyses. Tissue cores in the images of stained tissue sections are automatically located and extracted and can be evaluated using a set of predefined analysis algorithms. Additional evaluation algorithms can be easily integrated into the application via a plug-in interface. Downstream analysis of results is facilitated via a flexible query generator. We have developed an integrated system tailored to the specific needs of research projects using high density TMAs. It covers the complete workflow of TMA production, experimental use and subsequent analysis. The system is freely available for academic and non-profit institutions from http://genome.tugraz.at/Software/TAMEE.
Maintaining and Enhancing Diversity of Sampled Protein Conformations in Robotics-Inspired Methods.
Abella, Jayvee R; Moll, Mark; Kavraki, Lydia E
2018-01-01
The ability to efficiently sample structurally diverse protein conformations allows one to gain a high-level view of a protein's energy landscape. Algorithms from robot motion planning have been used for conformational sampling, and several of these algorithms promote diversity by keeping track of "coverage" in conformational space based on the local sampling density. However, large proteins present special challenges. In particular, larger systems require running many concurrent instances of these algorithms, but these algorithms can quickly become memory intensive because they typically keep previously sampled conformations in memory to maintain coverage estimates. In addition, robotics-inspired algorithms depend on defining useful perturbation strategies for exploring the conformational space, which is a difficult task for large proteins because such systems are typically more constrained and exhibit complex motions. In this article, we introduce two methodologies for maintaining and enhancing diversity in robotics-inspired conformational sampling. The first method addresses algorithms based on coverage estimates and leverages the use of a low-dimensional projection to define a global coverage grid that maintains coverage across concurrent runs of sampling. The second method is an automatic definition of a perturbation strategy through readily available flexibility information derived from B-factors, secondary structure, and rigidity analysis. Our results show a significant increase in the diversity of the conformations sampled for proteins consisting of up to 500 residues when applied to a specific robotics-inspired algorithm for conformational sampling. The methodologies presented in this article may be vital components for the scalability of robotics-inspired approaches.
Identify High-Quality Protein Structural Models by Enhanced K-Means.
Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means
Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
NASA Astrophysics Data System (ADS)
Qiao, Qin; Zhang, Hou-Dao; Huang, Xuhui
2016-04-01
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qiao, Qin, E-mail: qqiao@ust.hk; Zhang, Hou-Dao; Huang, Xuhui, E-mail: xuhuihuang@ust.hk
2016-04-21
Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kineticsmore » are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.« less
Eosinophilic pustular folliculitis: A proposal of diagnostic and therapeutic algorithms.
Nomura, Takashi; Katoh, Mayumi; Yamamoto, Yosuke; Miyachi, Yoshiki; Kabashima, Kenji
2016-11-01
Eosinophilic pustular folliculitis (EPF) is a sterile inflammatory dermatosis of unknown etiology. In addition to classic EPF, which affects otherwise healthy individuals, an immunocompromised state can cause immunosuppression-associated EPF (IS-EPF), which may be referred to dermatologists in inpatient services for assessments. Infancy-associated EPF (I-EPF) is the least characterized subtype, being observed mainly in non-Japanese infants. Diagnosis of EPF is challenging because its lesions mimic those of other common diseases, such as acne and dermatomycosis. Furthermore, there is no consensus regarding the treatment for each subtype of EPF. Here, we created procedure algorithms that facilitate the diagnosis and selection of therapeutic options on the basis of published work available in the public domain. Our diagnostic algorithm comprised a simple flowchart to direct physicians toward proper diagnosis. Recommended regimens were summarized in an easy-to-comprehend therapeutic algorithm for each subtype of EPF. These algorithms would facilitate the diagnostic and therapeutic procedure of EPF. © 2016 Japanese Dermatological Association.
CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks
NASA Astrophysics Data System (ADS)
Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng
2014-12-01
Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.
An, Ji‐Yong; Meng, Fan‐Rong; Chen, Xing; Yan, Gui‐Ying; Hu, Ji‐Pu
2016-01-01
Abstract Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. PMID:27452983
Li, Guo-Zhong; Vissers, Johannes P C; Silva, Jeffrey C; Golick, Dan; Gorenstein, Marc V; Geromanos, Scott J
2009-03-01
A novel database search algorithm is presented for the qualitative identification of proteins over a wide dynamic range, both in simple and complex biological samples. The algorithm has been designed for the analysis of data originating from data independent acquisitions, whereby multiple precursor ions are fragmented simultaneously. Measurements used by the algorithm include retention time, ion intensities, charge state, and accurate masses on both precursor and product ions from LC-MS data. The search algorithm uses an iterative process whereby each iteration incrementally increases the selectivity, specificity, and sensitivity of the overall strategy. Increased specificity is obtained by utilizing a subset database search approach, whereby for each subsequent stage of the search, only those peptides from securely identified proteins are queried. Tentative peptide and protein identifications are ranked and scored by their relative correlation to a number of models of known and empirically derived physicochemical attributes of proteins and peptides. In addition, the algorithm utilizes decoy database techniques for automatically determining the false positive identification rates. The search algorithm has been tested by comparing the search results from a four-protein mixture, the same four-protein mixture spiked into a complex biological background, and a variety of other "system" type protein digest mixtures. The method was validated independently by data dependent methods, while concurrently relying on replication and selectivity. Comparisons were also performed with other commercially and publicly available peptide fragmentation search algorithms. The presented results demonstrate the ability to correctly identify peptides and proteins from data independent acquisition strategies with high sensitivity and specificity. They also illustrate a more comprehensive analysis of the samples studied; providing approximately 20% more protein identifications, compared to a more conventional data directed approach using the same identification criteria, with a concurrent increase in both sequence coverage and the number of modified peptides.
Parallel Computational Protein Design.
Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang
2017-01-01
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
Deep Learning and Its Applications in Biomedicine.
Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi
2018-02-01
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.
Protein Sequence Classification with Improved Extreme Learning Machine Algorithms
2014-01-01
Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1973-01-01
An algorithm and computer program are presented for generating all the distinct 2(p-q) fractional factorial designs. Some applications of this algorithm to the construction of tables of designs and of designs for nonstandard situations and its use in Bayesian design are discussed. An appendix includes a discussion of an actual experiment whose design was facilitated by the algorithm.
Computational intelligence techniques for biological data mining: An overview
NASA Astrophysics Data System (ADS)
Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari
2014-10-01
Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Modeling and simulating networks of interdependent protein interactions.
Stöcker, Bianca K; Köster, Johannes; Zamir, Eli; Rahmann, Sven
2018-05-21
Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of these systems emerge not only from the protein interactions themselves but also from the dependencies between these interactions, as generated by allosteric effects or mutual exclusion due to steric hindrance. Therefore, formal models for integrating and utilizing information about interaction dependencies are of high interest. Here, we describe an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining constrained protein interaction networks ("constrained networks"). The construction of these networks is based on public interaction databases as well as text-mined information about interaction dependencies. We present an efficient data structure and algorithm to simulate protein complex formation in constrained networks. The efficiency of the model allows fast simulation and facilitates the analysis of many proteins in large networks. In addition, this approach enables the simulation of perturbation effects, such as knockout of single or multiple proteins and changes of protein concentrations. We illustrate how our model can be used to analyze a constrained human adhesome protein network, which is responsible for the formation of diverse and dynamic cell-matrix adhesion sites. By comparing protein complex formation under known interaction dependencies versus without dependencies, we investigate how these dependencies shape the resulting repertoire of protein complexes. Furthermore, our model enables investigating how the interplay of network topology with interaction dependencies influences the propagation of perturbation effects across a large biochemical system. Our simulation software CPINSim (for Constrained Protein Interaction Network Simulator) is available under the MIT license at http://github.com/BiancaStoecker/cpinsim and as a Bioconda package (https://bioconda.github.io).
PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.
Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István
2005-01-01
PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.
Pilla, Kala Bharath; Otting, Gottfried; Huber, Thomas
2017-03-07
Computational and nuclear magnetic resonance hybrid approaches provide efficient tools for 3D structure determination of small proteins, but currently available algorithms struggle to perform with larger proteins. Here we demonstrate a new computational algorithm that assembles the 3D structure of a protein from its constituent super-secondary structural motifs (Smotifs) with the help of pseudocontact shift (PCS) restraints for backbone amide protons, where the PCSs are produced from different metal centers. The algorithm, DINGO-PCS (3D assembly of Individual Smotifs to Near-native Geometry as Orchestrated by PCSs), employs the PCSs to recognize, orient, and assemble the constituent Smotifs of the target protein without any other experimental data or computational force fields. Using a universal Smotif database, the DINGO-PCS algorithm exhaustively enumerates any given Smotif. We benchmarked the program against ten different protein targets ranging from 100 to 220 residues with different topologies. For nine of these targets, the method was able to identify near-native Smotifs. Copyright © 2017 Elsevier Ltd. All rights reserved.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enghauser, Michael
2016-02-01
The goal of the Domestic Nuclear Detection Office (DNDO) Algorithm Improvement Program (AIP) is to facilitate gamma-radiation detector nuclide identification algorithm development, improvement, and validation. Accordingly, scoring criteria have been developed to objectively assess the performance of nuclide identification algorithms. In addition, a Microsoft Excel spreadsheet application for automated nuclide identification scoring has been developed. This report provides an overview of the equations, nuclide weighting factors, nuclide equivalencies, and configuration weighting factors used by the application for scoring nuclide identification algorithm performance. Furthermore, this report presents a general overview of the nuclide identification algorithm scoring application including illustrative examples.
He, Jianjun; Gu, Hong; Liu, Wenqi
2012-01-01
It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.
Evaluation of the novel algorithm of flexible ligand docking with moveable target-protein atoms.
Sulimov, Alexey V; Zheltkov, Dmitry A; Oferkin, Igor V; Kutov, Danil C; Katkova, Ekaterina V; Tyrtyshnikov, Eugene E; Sulimov, Vladimir B
2017-01-01
We present the novel docking algorithm based on the Tensor Train decomposition and the TT-Cross global optimization. The algorithm is applied to the docking problem with flexible ligand and moveable protein atoms. The energy of the protein-ligand complex is calculated in the frame of the MMFF94 force field in vacuum. The grid of precalculated energy potentials of probe ligand atoms in the field of the target protein atoms is not used. The energy of the protein-ligand complex for any given configuration is computed directly with the MMFF94 force field without any fitting parameters. The conformation space of the system coordinates is formed by translations and rotations of the ligand as a whole, by the ligand torsions and also by Cartesian coordinates of the selected target protein atoms. Mobility of protein and ligand atoms is taken into account in the docking process simultaneously and equally. The algorithm is realized in the novel parallel docking SOL-P program and results of its performance for a set of 30 protein-ligand complexes are presented. Dependence of the docking positioning accuracy is investigated as a function of parameters of the docking algorithm and the number of protein moveable atoms. It is shown that mobility of the protein atoms improves docking positioning accuracy. The SOL-P program is able to perform docking of a flexible ligand into the active site of the target protein with several dozens of protein moveable atoms: the native crystallized ligand pose is correctly found as the global energy minimum in the search space with 157 dimensions using 4700 CPU ∗ h at the Lomonosov supercomputer.
Optimal processing for gel electrophoresis images: Applying Monte Carlo Tree Search in GelApp.
Nguyen, Phi-Vu; Ghezal, Ali; Hsueh, Ya-Chih; Boudier, Thomas; Gan, Samuel Ken-En; Lee, Hwee Kuan
2016-08-01
In biomedical research, gel band size estimation in electrophoresis analysis is a routine process. To facilitate and automate this process, numerous software have been released, notably the GelApp mobile app. However, the band detection accuracy is limited due to a band detection algorithm that cannot adapt to the variations in input images. To address this, we used the Monte Carlo Tree Search with Upper Confidence Bound (MCTS-UCB) method to efficiently search for optimal image processing pipelines for the band detection task, thereby improving the segmentation algorithm. Incorporating this into GelApp, we report a significant enhancement of gel band detection accuracy by 55.9 ± 2.0% for protein polyacrylamide gels, and 35.9 ± 2.5% for DNA SYBR green agarose gels. This implementation is a proof-of-concept in demonstrating MCTS-UCB as a strategy to optimize general image segmentation. The improved version of GelApp-GelApp 2.0-is freely available on both Google Play Store (for Android platform), and Apple App Store (for iOS platform). © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Azimipour, Mehdi; Sheikhzadeh, Mahya; Baumgartner, Ryan; Cullen, Patrick K; Helmstetter, Fred J; Chang, Woo-Jin; Pashaie, Ramin
2017-01-01
We present our effort in implementing a fluorescence laminar optical tomography scanner which is specifically designed for noninvasive three-dimensional imaging of fluorescence proteins in the brains of small rodents. A laser beam, after passing through a cylindrical lens, scans the brain tissue from the surface while the emission signal is captured by the epi-fluorescence optics and is recorded using an electron multiplication CCD sensor. Image reconstruction algorithms are developed based on Monte Carlo simulation to model light–tissue interaction and generate the sensitivity matrices. To solve the inverse problem, we used the iterative simultaneous algebraic reconstruction technique. The performance of the developed system was evaluated by imaging microfabricated silicon microchannels embedded inside a substrate with optical properties close to the brain as a tissue phantom and ultimately by scanning brain tissue in vivo. Details of the hardware design and reconstruction algorithms are discussed and several experimental results are presented. The developed system can specifically facilitate neuroscience experiments where fluorescence imaging and molecular genetic methods are used to study the dynamics of the brain circuitries.
Algorithms and semantic infrastructure for mutation impact extraction and grounding.
Laurila, Jonas B; Naderi, Nona; Witte, René; Riazanov, Alexandre; Kouznetsov, Alexandre; Baker, Christopher J O
2010-12-02
Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
Normalized Cut Algorithm for Automated Assignment of Protein Domains
NASA Technical Reports Server (NTRS)
Samanta, M. P.; Liang, S.; Zha, H.; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present a novel computational method for automatic assignment of protein domains from structural data. At the core of our algorithm lies a recently proposed clustering technique that has been very successful for image-partitioning applications. This grap.,l-theory based clustering method uses the notion of a normalized cut to partition. an undirected graph into its strongly-connected components. Computer implementation of our method tested on the standard comparison set of proteins from the literature shows a high success rate (84%), better than most existing alternative In addition, several other features of our algorithm, such as reliance on few adjustable parameters, linear run-time with respect to the size of the protein and reduced complexity compared to other graph-theory based algorithms, would make it an attractive tool for structural biologists.
Symbolic discrete event system specification
NASA Technical Reports Server (NTRS)
Zeigler, Bernard P.; Chi, Sungdo
1992-01-01
Extending discrete event modeling formalisms to facilitate greater symbol manipulation capabilities is important to further their use in intelligent control and design of high autonomy systems. An extension to the DEVS formalism that facilitates symbolic expression of event times by extending the time base from the real numbers to the field of linear polynomials over the reals is defined. A simulation algorithm is developed to generate the branching trajectories resulting from the underlying nondeterminism. To efficiently manage symbolic constraints, a consistency checking algorithm for linear polynomial constraints based on feasibility checking algorithms borrowed from linear programming has been developed. The extended formalism offers a convenient means to conduct multiple, simultaneous explorations of model behaviors. Examples of application are given with concentration on fault model analysis.
Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna
2017-10-01
Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Pan, Huipeng; Ma, Yabin; Zhang, Deyong; Liu, Yong; Zhang, Zhanhong; Zheng, Changying; Chu, Dong
2015-01-01
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for measuring and evaluating gene expression during variable biological processes. To facilitate gene expression studies, normalization of genes of interest relative to stable reference genes is crucial. The western flower thrips Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), the main vector of tomato spotted wilt virus (TSWV), is a destructive invasive species. In this study, the expression profiles of 11 candidate reference genes from nonviruliferous and viruliferous F. occidentalis were investigated. Five distinct algorithms, geNorm, NormFinder, BestKeeper, the ΔC t method, and RefFinder, were used to determine the performance of these genes. geNorm, NormFinder, BestKeeper, and RefFinder identified heat shock protein 70 (HSP70), heat shock protein 60 (HSP60), elongation factor 1 α, and ribosomal protein l32 (RPL32) as the most stable reference genes, and the ΔC t method identified HSP60, HSP70, RPL32, and heat shock protein 90 as the most stable reference genes. Additionally, two reference genes were sufficient for reliable normalization in nonviruliferous and viruliferous F. occidentalis. This work provides a foundation for investigating the molecular mechanisms of TSWV and F. occidentalis interactions. PMID:26244556
Zhou, Ruhong
2004-05-01
A highly parallel replica exchange method (REM) that couples with a newly developed molecular dynamics algorithm particle-particle particle-mesh Ewald (P3ME)/RESPA has been proposed for efficient sampling of protein folding free energy landscape. The algorithm is then applied to two separate protein systems, beta-hairpin and a designed protein Trp-cage. The all-atom OPLSAA force field with an explicit solvent model is used for both protein folding simulations. Up to 64 replicas of solvated protein systems are simulated in parallel over a wide range of temperatures. The combined trajectories in temperature and configurational space allow a replica to overcome free energy barriers present at low temperatures. These large scale simulations reveal detailed results on folding mechanisms, intermediate state structures, thermodynamic properties and the temperature dependences for both protein systems.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
A Novel Algorithm for Detecting Protein Complexes with the Breadth First Search
Tang, Xiwei; Wang, Jianxin; Li, Min; He, Yiming; Pan, Yi
2014-01-01
Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes. PMID:24818139
How Structure Defines Affinity in Protein-Protein Interactions
Erijman, Ariel; Rosenthal, Eran; Shifman, Julia M.
2014-01-01
Protein-protein interactions (PPI) in nature are conveyed by a multitude of binding modes involving various surfaces, secondary structure elements and intermolecular interactions. This diversity results in PPI binding affinities that span more than nine orders of magnitude. Several early studies attempted to correlate PPI binding affinities to various structure-derived features with limited success. The growing number of high-resolution structures, the appearance of more precise methods for measuring binding affinities and the development of new computational algorithms enable more thorough investigations in this direction. Here, we use a large dataset of PPI structures with the documented binding affinities to calculate a number of structure-based features that could potentially define binding energetics. We explore how well each calculated biophysical feature alone correlates with binding affinity and determine the features that could be used to distinguish between high-, medium- and low- affinity PPIs. Furthermore, we test how various combinations of features could be applied to predict binding affinity and observe a slow improvement in correlation as more features are incorporated into the equation. In addition, we observe a considerable improvement in predictions if we exclude from our analysis low-resolution and NMR structures, revealing the importance of capturing exact intermolecular interactions in our calculations. Our analysis should facilitate prediction of new interactions on the genome scale, better characterization of signaling networks and design of novel binding partners for various target proteins. PMID:25329579
Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.
Liu, Bernard A
2017-01-01
Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.
Yang, Xuefeng; Mei, Shuang; Wang, Xiaolei; Li, Xiang; Liu, Rui; Ma, Yan; Hao, Liping; Yao, Ping; Liu, Liegang; Sun, Xiufa; Gu, Haihua; Liu, Zhenqi; Cao, Wenhong
2013-03-29
In this study, we addressed the direct effect of leucine on insulin signaling. In investigating the associated mechanisms, we found that leucine itself does not activate the classical Akt- or ERK1/2 MAP kinase-dependent signaling pathways but can facilitate the insulin-induced phosphorylations of Akt(473) and ERK1/2 in a time- and dose-dependent manner in cultured hepatocytes. The leucine-facilitated insulin-induced phosphorylation of Akt at residue 473 was not affected by knocking down the key component of mTORC1 or -2 complexes but was blocked by inhibition of c-Src (PP2), PI3K (LY294002), Gαi protein (pertussis toxin or siRNA against Gαi1 gene, or β-arrestin 2 (siRNA)). Similarly, the leucine-facilitated insulin activation of ERK1/2 was also blunted by pertussis toxin. We further show that leucine facilitated the insulin-mediated suppression of glucose production and expression of key gluconeogenic genes in a Gαi1 protein-dependent manner in cultured primary hepatocytes. Together, these results show that leucine can directly facilitate insulin signaling through a Gαi protein-dependent intracellular signaling pathway. This is the first evidence showing that macronutrients like amino acid leucine can facilitate insulin signaling through G proteins directly.
Yang, Xuefeng; Mei, Shuang; Wang, Xiaolei; Li, Xiang; Liu, Rui; Ma, Yan; Hao, Liping; Yao, Ping; Liu, Liegang; Sun, Xiufa; Gu, Haihua; Liu, Zhenqi; Cao, Wenhong
2013-01-01
In this study, we addressed the direct effect of leucine on insulin signaling. In investigating the associated mechanisms, we found that leucine itself does not activate the classical Akt- or ERK1/2 MAP kinase-dependent signaling pathways but can facilitate the insulin-induced phosphorylations of Akt473 and ERK1/2 in a time- and dose-dependent manner in cultured hepatocytes. The leucine-facilitated insulin-induced phosphorylation of Akt at residue 473 was not affected by knocking down the key component of mTORC1 or -2 complexes but was blocked by inhibition of c-Src (PP2), PI3K (LY294002), Gαi protein (pertussis toxin or siRNA against Gαi1 gene, or β-arrestin 2 (siRNA)). Similarly, the leucine-facilitated insulin activation of ERK1/2 was also blunted by pertussis toxin. We further show that leucine facilitated the insulin-mediated suppression of glucose production and expression of key gluconeogenic genes in a Gαi1 protein-dependent manner in cultured primary hepatocytes. Together, these results show that leucine can directly facilitate insulin signaling through a Gαi protein-dependent intracellular signaling pathway. This is the first evidence showing that macronutrients like amino acid leucine can facilitate insulin signaling through G proteins directly. PMID:23404499
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Dynamic Conformations of Nucleosome Arrays in Solution from Small-Angle X-ray Scattering
NASA Astrophysics Data System (ADS)
Howell, Steven C.
Chromatin conformation and dynamics remains unsolved despite the critical role of the chromatin in fundamental genetic functions such as transcription, replication, and repair. At the molecular level, chromatin can be viewed as a linear array of nucleosomes, each consisting of 147 base pairs (bp) of double-stranded DNA (dsDNA) wrapped around a protein core and connected by 10 to 90 bp of linker dsDNA. Using small-angle X-ray scattering (SAXS), we investigated how the conformations of model nucleosome arrays in solution are modulated by ionic condition as well as the effect of linker histone proteins. To facilitate ensemble modeling of these SAXS measurements, we developed a simulation method that treats coarse-grained DNA as a Markov chain, then explores possible DNA conformations using Metropolis Monte Carlo (MC) sampling. This algorithm extends the functionality of SASSIE, a program used to model intrinsically disordered biological molecules, adding to the previous methods for simulating protein, carbohydrates, and single-stranded DNA. Our SAXS measurements of various nucleosome arrays together with the MC generated models provide valuable solution structure information identifying specific differences from the structure of crystallized arrays.
Berlin, Konstantin; Longhini, Andrew; Dayie, T Kwaku; Fushman, David
2013-12-01
To facilitate rigorous analysis of molecular motions in proteins, DNA, and RNA, we present a new version of ROTDIF, a program for determining the overall rotational diffusion tensor from single- or multiple-field nuclear magnetic resonance relaxation data. We introduce four major features that expand the program's versatility and usability. The first feature is the ability to analyze, separately or together, (13)C and/or (15)N relaxation data collected at a single or multiple fields. A significant improvement in the accuracy compared to direct analysis of R2/R1 ratios, especially critical for analysis of (13)C relaxation data, is achieved by subtracting high-frequency contributions to relaxation rates. The second new feature is an improved method for computing the rotational diffusion tensor in the presence of biased errors, such as large conformational exchange contributions, that significantly enhances the accuracy of the computation. The third new feature is the integration of the domain alignment and docking module for relaxation-based structure determination of multi-domain systems. Finally, to improve accessibility to all the program features, we introduced a graphical user interface that simplifies and speeds up the analysis of the data. Written in Java, the new ROTDIF can run on virtually any computer platform. In addition, the new ROTDIF achieves an order of magnitude speedup over the previous version by implementing a more efficient deterministic minimization algorithm. We not only demonstrate the improvement in accuracy and speed of the new algorithm for synthetic and experimental (13)C and (15)N relaxation data for several proteins and nucleic acids, but also show that careful analysis required especially for characterizing RNA dynamics allowed us to uncover subtle conformational changes in RNA as a function of temperature that were opaque to previous analysis.
MASH Suite Pro: A Comprehensive Software Tool for Top-Down Proteomics*
Cai, Wenxuan; Guner, Huseyin; Gregorich, Zachery R.; Chen, Albert J.; Ayaz-Guner, Serife; Peng, Ying; Valeja, Santosh G.; Liu, Xiaowen; Ge, Ying
2016-01-01
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics. PMID:26598644
Bio-inspired algorithms applied to molecular docking simulations.
Heberlé, G; de Azevedo, W F
2011-01-01
Nature as a source of inspiration has been shown to have a great beneficial impact on the development of new computational methodologies. In this scenario, analyses of the interactions between a protein target and a ligand can be simulated by biologically inspired algorithms (BIAs). These algorithms mimic biological systems to create new paradigms for computation, such as neural networks, evolutionary computing, and swarm intelligence. This review provides a description of the main concepts behind BIAs applied to molecular docking simulations. Special attention is devoted to evolutionary algorithms, guided-directed evolutionary algorithms, and Lamarckian genetic algorithms. Recent applications of these methodologies to protein targets identified in the Mycobacterium tuberculosis genome are described.
cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design
Pan, Yuchao; Dong, Yuxi; Zhou, Jingtian; Hallen, Mark; Donald, Bruce R.; Xu, Wei
2016-01-01
Abstract Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches. PMID:27154509
Computational Modeling of Proteins based on Cellular Automata: A Method of HP Folding Approximation.
Madain, Alia; Abu Dalhoum, Abdel Latif; Sleit, Azzam
2018-06-01
The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic-hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H-H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H's at even positions and the other side ignores H's at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H-H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.
Ma, Hongyan; Delafield, Daniel G; Wang, Zhe; You, Jianlan; Wu, Si
2017-04-01
The microbial secretome, known as a pool of biomass (i.e., plant-based materials) degrading enzymes, can be utilized to discover industrial enzyme candidates for biofuel production. Proteomics approaches have been applied to discover novel enzyme candidates through comparing protein expression profiles with enzyme activity of the whole secretome under different growth conditions. However, the activity measurement of each enzyme candidate is needed for confident "active" enzyme assignments, which remains to be elucidated. To address this challenge, we have developed an Activity-Correlated Quantitative Proteomics Platform (ACPP) that systematically correlates protein-level enzymatic activity patterns and protein elution profiles using a label-free quantitative proteomics approach. The ACPP optimized a high performance anion exchange separation for efficiently fractionating complex protein samples while preserving enzymatic activities. The detected enzymatic activity patterns in sequential fractions using microplate-based assays were cross-correlated with protein elution profiles using a customized pattern-matching algorithm with a correlation R-score. The ACPP has been successfully applied to the identification of two types of "active" biomass-degrading enzymes (i.e., starch hydrolysis enzymes and cellulose hydrolysis enzymes) from Aspergillus niger secretome in a multiplexed fashion. By determining protein elution profiles of 156 proteins in A. niger secretome, we confidently identified the 1,4-α-glucosidase as the major "active" starch hydrolysis enzyme (R = 0.96) and the endoglucanase as the major "active" cellulose hydrolysis enzyme (R = 0.97). The results demonstrated that the ACPP facilitated the discovery of bioactive enzymes from complex protein samples in a high-throughput, multiplexing, and untargeted fashion. Graphical Abstract ᅟ.
NASA Astrophysics Data System (ADS)
Ma, Hongyan; Delafield, Daniel G.; Wang, Zhe; You, Jianlan; Wu, Si
2017-04-01
The microbial secretome, known as a pool of biomass (i.e., plant-based materials) degrading enzymes, can be utilized to discover industrial enzyme candidates for biofuel production. Proteomics approaches have been applied to discover novel enzyme candidates through comparing protein expression profiles with enzyme activity of the whole secretome under different growth conditions. However, the activity measurement of each enzyme candidate is needed for confident "active" enzyme assignments, which remains to be elucidated. To address this challenge, we have developed an Activity-Correlated Quantitative Proteomics Platform (ACPP) that systematically correlates protein-level enzymatic activity patterns and protein elution profiles using a label-free quantitative proteomics approach. The ACPP optimized a high performance anion exchange separation for efficiently fractionating complex protein samples while preserving enzymatic activities. The detected enzymatic activity patterns in sequential fractions using microplate-based assays were cross-correlated with protein elution profiles using a customized pattern-matching algorithm with a correlation R-score. The ACPP has been successfully applied to the identification of two types of "active" biomass-degrading enzymes (i.e., starch hydrolysis enzymes and cellulose hydrolysis enzymes) from Aspergillus niger secretome in a multiplexed fashion. By determining protein elution profiles of 156 proteins in A. niger secretome, we confidently identified the 1,4-α-glucosidase as the major "active" starch hydrolysis enzyme (R = 0.96) and the endoglucanase as the major "active" cellulose hydrolysis enzyme (R = 0.97). The results demonstrated that the ACPP facilitated the discovery of bioactive enzymes from complex protein samples in a high-throughput, multiplexing, and untargeted fashion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enghauser, Michael
2015-02-01
The goal of the Domestic Nuclear Detection Office (DNDO) Algorithm Improvement Program (AIP) is to facilitate gamma-radiation detector nuclide identification algorithm development, improvement, and validation. Accordingly, scoring criteria have been developed to objectively assess the performance of nuclide identification algorithms. In addition, a Microsoft Excel spreadsheet application for automated nuclide identification scoring has been developed. This report provides an overview of the equations, nuclide weighting factors, nuclide equivalencies, and configuration weighting factors used by the application for scoring nuclide identification algorithm performance. Furthermore, this report presents a general overview of the nuclide identification algorithm scoring application including illustrative examples.
Geometric Detection Algorithms for Cavities on Protein Surfaces in Molecular Graphics: A Survey
Simões, Tiago; Lopes, Daniel; Dias, Sérgio; Fernandes, Francisco; Pereira, João; Jorge, Joaquim; Bajaj, Chandrajit; Gomes, Abel
2017-01-01
Detecting and analyzing protein cavities provides significant information about active sites for biological processes (e.g., protein-protein or protein-ligand binding) in molecular graphics and modeling. Using the three-dimensional structure of a given protein (i.e., atom types and their locations in 3D) as retrieved from a PDB (Protein Data Bank) file, it is now computationally viable to determine a description of these cavities. Such cavities correspond to pockets, clefts, invaginations, voids, tunnels, channels, and grooves on the surface of a given protein. In this work, we survey the literature on protein cavity computation and classify algorithmic approaches into three categories: evolution-based, energy-based, and geometry-based. Our survey focuses on geometric algorithms, whose taxonomy is extended to include not only sphere-, grid-, and tessellation-based methods, but also surface-based, hybrid geometric, consensus, and time-varying methods. Finally, we detail those techniques that have been customized for GPU (Graphics Processing Unit) computing. PMID:29520122
Learning a peptide-protein binding affinity predictor with kernel ridge regression
2013-01-01
Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081
Optical Detection of Degraded Therapeutic Proteins.
Herrington, William F; Singh, Gajendra P; Wu, Di; Barone, Paul W; Hancock, William; Ram, Rajeev J
2018-03-23
The quality of therapeutic proteins such as hormones, subunit and conjugate vaccines, and antibodies is critical to the safety and efficacy of modern medicine. Identifying malformed proteins at the point-of-care can prevent adverse immune reactions in patients; this is of special concern when there is an insecure supply chain resulting in the delivery of degraded, or even counterfeit, drug product. Identification of degraded protein, for example human growth hormone, is demonstrated by applying automated anomaly detection algorithms. Detection of the degraded protein differs from previous applications of machine-learning and classification to spectral analysis: only example spectra of genuine, high-quality drug products are used to construct the classifier. The algorithm is tested on Raman spectra acquired on protein dilutions typical of formulated drug product and at sample volumes of 25 µL, below the typical overfill (waste) volumes present in vials of injectable drug product. The algorithm is demonstrated to correctly classify anomalous recombinant human growth hormone (rhGH) with 92% sensitivity and 98% specificity even when the algorithm has only previously encountered high-quality drug product.
Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.
Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús
2008-10-01
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.
Semantic integration to identify overlapping functional modules in protein interaction networks
Cho, Young-Rae; Hwang, Woochang; Ramanathan, Murali; Zhang, Aidong
2007-01-01
Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification. PMID:17650343
Avsian-Kretchmer, Orna; Hsueh, Aaron J W
2004-01-01
TGF-beta family proteins with a cystine knot motif serve as ligands for diverse families of plasma membrane receptors. Bone morphogenetic protein (BMP) antagonists represent a subgroup of these proteins, some of which bind BMPs and antagonize their actions during development and morphogenesis. Availability of completed genome sequences from diverse organisms allows bioinformatic analysis of the evolution of BMP antagonists and facilitates their classification. Using a regular expression algorithm (http://BioRegEx.stanford.edu), an exhaustive search of the human genome identified all cystine knot-containing BMP antagonists. Based on the size of the cystine ring, these proteins were divided into three subfamilies: CAN (eight-membered ring), twisted gastrulation (nine-membered ring), as well as chordin and noggin (10-membered ring). The CAN family can be divided further into four subgroups based on a conserved arrangement of additional cysteine residues-gremlin and PRDC, cerberus and coco, and DAN, together with USAG-1 and sclerostin. We searched for orthologs of human BMP antagonists in the genomes of model organisms and analyzed their phylogenetic relationship. New human paralogs were identified together with the verification of orthologous relationships of known genes. We also discuss the physiological roles of the CAN subfamily of BMP antagonists and the associated genetic defects. Based on the known three-dimensional structure of key cystine knot proteins, we postulated disulfide bondings for eight-membered ring BMP antagonists to predict their potential folding and dimerization.
Energy design for protein-protein interactions
Ravikant, D. V. S.; Elber, Ron
2011-01-01
Proteins bind to other proteins efficiently and specifically to carry on many cell functions such as signaling, activation, transport, enzymatic reactions, and more. To determine the geometry and strength of binding of a protein pair, an energy function is required. An algorithm to design an optimal energy function, based on empirical data of protein complexes, is proposed and applied. Emphasis is made on negative design in which incorrect geometries are presented to the algorithm that learns to avoid them. For the docking problem the search for plausible geometries can be performed exhaustively. The possible geometries of the complex are generated on a grid with the help of a fast Fourier transform algorithm. A novel formulation of negative design makes it possible to investigate iteratively hundreds of millions of negative examples while monotonically improving the quality of the potential. Experimental structures for 640 protein complexes are used to generate positive and negative examples for learning parameters. The algorithm designed in this work finds the correct binding structure as the lowest energy minimum in 318 cases of the 640 examples. Further benchmarks on independent sets confirm the significant capacity of the scoring function to recognize correct modes of interactions. PMID:21842951
An Algorithm for Protein Helix Assignment Using Helix Geometry
Cao, Chen; Xu, Shutan; Wang, Lincong
2015-01-01
Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins. PMID:26132394
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
Lakbub, Jude C; Shipman, Joshua T; Desaire, Heather
2018-04-01
Disulfide bonds are important structural moieties of proteins: they ensure proper folding, provide stability, and ensure proper function. With the increasing use of proteins for biotherapeutics, particularly monoclonal antibodies, which are highly disulfide bonded, it is now important to confirm the correct disulfide bond connectivity and to verify the presence, or absence, of disulfide bond variants in the protein therapeutics. These studies help to ensure safety and efficacy. Hence, disulfide bonds are among the critical quality attributes of proteins that have to be monitored closely during the development of biotherapeutics. However, disulfide bond analysis is challenging because of the complexity of the biomolecules. Mass spectrometry (MS) has been the go-to analytical tool for the characterization of such complex biomolecules, and several methods have been reported to meet the challenging task of mapping disulfide bonds in proteins. In this review, we describe the relevant, recent MS-based techniques and provide important considerations needed for efficient disulfide bond analysis in proteins. The review focuses on methods for proper sample preparation, fragmentation techniques for disulfide bond analysis, recent disulfide bond mapping methods based on the fragmentation techniques, and automated algorithms designed for rapid analysis of disulfide bonds from liquid chromatography-MS/MS data. Researchers involved in method development for protein characterization can use the information herein to facilitate development of new MS-based methods for protein disulfide bond analysis. In addition, individuals characterizing biotherapeutics, especially by disulfide bond mapping in antibodies, can use this review to choose the best strategies for disulfide bond assignment of their biologic products. Graphical Abstract This review, describing characterization methods for disulfide bonds in proteins, focuses on three critical components: sample preparation, mass spectrometry data, and software tools.
A linear programming model for protein inference problem in shotgun proteomics.
Huang, Ting; He, Zengyou
2012-11-15
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.
Quantum Dynamics in Continuum for Proton Transport I: Basic Formulation.
Chen, Duan; Wei, Guo-Wei
2013-01-01
Proton transport is one of the most important and interesting phenomena in living cells. The present work proposes a multiscale/multiphysics model for the understanding of the molecular mechanism of proton transport in transmembrane proteins. We describe proton dynamics quantum mechanically via a density functional approach while implicitly model other solvent ions as a dielectric continuum to reduce the number of degrees of freedom. The densities of all other ions in the solvent are assumed to obey the Boltzmann distribution. The impact of protein molecular structure and its charge polarization on the proton transport is considered explicitly at the atomic level. We formulate a total free energy functional to put proton kinetic and potential energies as well as electrostatic energy of all ions on an equal footing. The variational principle is employed to derive nonlinear governing equations for the proton transport system. Generalized Poisson-Boltzmann equation and Kohn-Sham equation are obtained from the variational framework. Theoretical formulations for the proton density and proton conductance are constructed based on fundamental principles. The molecular surface of the channel protein is utilized to split the discrete protein domain and the continuum solvent domain, and facilitate the multiscale discrete/continuum/quantum descriptions. A number of mathematical algorithms, including the Dirichlet to Neumann mapping, matched interface and boundary method, Gummel iteration, and Krylov space techniques are utilized to implement the proposed model in a computationally efficient manner. The Gramicidin A (GA) channel is used to demonstrate the performance of the proposed proton transport model and validate the efficiency of proposed mathematical algorithms. The electrostatic characteristics of the GA channel is analyzed with a wide range of model parameters. The proton conductances are studied over a number of applied voltages and reference concentrations. A comparison with experimental data verifies the present model predictions and validates the proposed model.
Computational intelligence techniques in bioinformatics.
Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I
2013-12-01
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Protein complex prediction for large protein protein interaction networks with the Core&Peel method.
Pellegrini, Marco; Baglioni, Miriam; Geraci, Filippo
2016-11-08
Biological networks play an increasingly important role in the exploration of functional modularity and cellular organization at a systemic level. Quite often the first tools used to analyze these networks are clustering algorithms. We concentrate here on the specific task of predicting protein complexes (PC) in large protein-protein interaction networks (PPIN). Currently, many state-of-the-art algorithms work well for networks of small or moderate size. However, their performance on much larger networks, which are becoming increasingly common in modern proteome-wise studies, needs to be re-assessed. We present a new fast algorithm for clustering large sparse networks: Core&Peel, which runs essentially in time and storage O(a(G)m+n) for a network G of n nodes and m arcs, where a(G) is the arboricity of G (which is roughly proportional to the maximum average degree of any induced subgraph in G). We evaluated Core&Peel on five PPI networks of large size and one of medium size from both yeast and homo sapiens, comparing its performance against those of ten state-of-the-art methods. We demonstrate that Core&Peel consistently outperforms the ten competitors in its ability to identify known protein complexes and in the functional coherence of its predictions. Our method is remarkably robust, being quite insensible to the injection of random interactions. Core&Peel is also empirically efficient attaining the second best running time over large networks among the tested algorithms. Our algorithm Core&Peel pushes forward the state-of the-art in PPIN clustering providing an algorithmic solution with polynomial running time that attains experimentally demonstrable good output quality and speed on challenging large real networks.
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard
2013-05-01
Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html
Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran
2017-02-01
The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.
A geometric initial guess for localized electronic orbitals in modular biological systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beckman, P. G.; Fattebert, J. L.; Lau, E. Y.
Recent first-principles molecular dynamics algorithms using localized electronic orbitals have achieved O(N) complexity and controlled accuracy in simulating systems with finite band gaps. However, accurately deter- mining the centers of these localized orbitals during simulation setup may require O(N 3) operations, which is computationally infeasible for many biological systems. We present an O(N) approach for approximating orbital centers in proteins, DNA, and RNA which uses non-localized solutions for a set of fixed-size subproblems to create a set of geometric maps applicable to larger systems. This scalable approach, used as an initial guess in the O(N) first-principles molecular dynamics code MGmol,more » facilitates first-principles simulations in biological systems of sizes which were previously impossible.« less
Computational approaches for the classification of seed storage proteins.
Radhika, V; Rao, V Sree Hari
2015-07-01
Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
Generic framework for mining cellular automata models on protein-folding simulations.
Diaz, N; Tischer, I
2016-05-13
Cellular automata model identification is an important way of building simplified simulation models. In this study, we describe a generic architectural framework to ease the development process of new metaheuristic-based algorithms for cellular automata model identification in protein-folding trajectories. Our framework was developed by a methodology based on design patterns that allow an improved experience for new algorithms development. The usefulness of the proposed framework is demonstrated by the implementation of four algorithms, able to obtain extremely precise cellular automata models of the protein-folding process with a protein contact map representation. Dynamic rules obtained by the proposed approach are discussed, and future use for the new tool is outlined.
NASA Astrophysics Data System (ADS)
Jo, Sunhwan; Jiang, Wei
2015-12-01
Replica Exchange with Solute Tempering (REST2) is a powerful sampling enhancement algorithm of molecular dynamics (MD) in that it needs significantly smaller number of replicas but achieves higher sampling efficiency relative to standard temperature exchange algorithm. In this paper, we extend the applicability of REST2 for quantitative biophysical simulations through a robust and generic implementation in greatly scalable MD software NAMD. The rescaling procedure of force field parameters controlling REST2 "hot region" is implemented into NAMD at the source code level. A user can conveniently select hot region through VMD and write the selection information into a PDB file. The rescaling keyword/parameter is written in NAMD Tcl script interface that enables an on-the-fly simulation parameter change. Our implementation of REST2 is within communication-enabled Tcl script built on top of Charm++, thus communication overhead of an exchange attempt is vanishingly small. Such a generic implementation facilitates seamless cooperation between REST2 and other modules of NAMD to provide enhanced sampling for complex biomolecular simulations. Three challenging applications including native REST2 simulation for peptide folding-unfolding transition, free energy perturbation/REST2 for absolute binding affinity of protein-ligand complex and umbrella sampling/REST2 Hamiltonian exchange for free energy landscape calculation were carried out on IBM Blue Gene/Q supercomputer to demonstrate efficacy of REST2 based on the present implementation.
Mehranfar, Adele; Ghadiri, Nasser; Kouhsar, Morteza; Golshani, Ashkan
2017-09-01
Detecting the protein complexes is an important task in analyzing the protein interaction networks. Although many algorithms predict protein complexes in different ways, surveys on the interaction networks indicate that about 50% of detected interactions are false positives. Consequently, the accuracy of existing methods needs to be improved. In this paper we propose a novel algorithm to detect the protein complexes in 'noisy' protein interaction data. First, we integrate several biological data sources to determine the reliability of each interaction and determine more accurate weights for the interactions. A data fusion component is used for this step, based on the interval type-2 fuzzy voter that provides an efficient combination of the information sources. This fusion component detects the errors and diminishes their effect on the detection protein complexes. So in the first step, the reliability scores have been assigned for every interaction in the network. In the second step, we have proposed a general protein complex detection algorithm by exploiting and adopting the strong points of other algorithms and existing hypotheses regarding real complexes. Finally, the proposed method has been applied for the yeast interaction datasets for predicting the interactions. The results show that our framework has a better performance regarding precision and F-measure than the existing approaches. Copyright © 2017 Elsevier Ltd. All rights reserved.
Jia, Peilin; Zhao, Zhongming
2014-01-01
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. PMID:24516372
Jia, Peilin; Zhao, Zhongming
2014-02-01
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-07-01
UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins.
Chavali, Sreenivas; Chavali, Pavithra L; Chalancon, Guilhem; de Groot, Natalia Sanchez; Gemayel, Rita; Latysheva, Natasha S; Ing-Simmons, Elizabeth; Verstrepen, Kevin J; Balaji, Santhanam; Babu, M Madan
2017-09-01
Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype-phenotype landscape of a population, thereby contributing to adaptation and fitness.
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Campagne, F; Weinstein, H
1999-01-01
An algorithmic method for drawing residue-based schematic diagrams of proteins on a 2D page is presented and illustrated. The method allows the creation of rendering engines dedicated to a given family of sequences, or fold. The initial implementation provides an engine that can produce a 2D diagram representing secondary structure for any transmembrane protein sequence. We present the details of the strategy for automating the drawing of these diagrams. The most important part of this strategy is the development of an algorithm for laying out residues of a loop that connects to arbitrary points of a 2D plane. As implemented, this algorithm is suitable for real-time modification of the loop layout. This work is of interest for the representation and analysis of data from (1) protein databases, (2) mutagenesis results, or (3) various kinds of protein context-dependent annotations or data.
Structural alignment of protein descriptors - a combinatorial model.
Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek
2016-09-17
Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).
Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review.
Canan, Chelsea; Polinski, Jennifer M; Alexander, G Caleb; Kowal, Mary K; Brennan, Troyen A; Shrank, William H
2017-11-01
Improved methods to identify nonmedical opioid use can help direct health care resources to individuals who need them. Automated algorithms that use large databases of electronic health care claims or records for surveillance are a potential means to achieve this goal. In this systematic review, we reviewed the utility, attempts at validation, and application of such algorithms to detect nonmedical opioid use. We searched PubMed and Embase for articles describing automatable algorithms that used electronic health care claims or records to identify patients or prescribers with likely nonmedical opioid use. We assessed algorithm development, validation, and performance characteristics and the settings where they were applied. Study variability precluded a meta-analysis. Of 15 included algorithms, 10 targeted patients, 2 targeted providers, 2 targeted both, and 1 identified medications with high abuse potential. Most patient-focused algorithms (67%) used prescription drug claims and/or medical claims, with diagnosis codes of substance abuse and/or dependence as the reference standard. Eleven algorithms were developed via regression modeling. Four used natural language processing, data mining, audit analysis, or factor analysis. Automated algorithms can facilitate population-level surveillance. However, there is no true gold standard for determining nonmedical opioid use. Users must recognize the implications of identifying false positives and, conversely, false negatives. Few algorithms have been applied in real-world settings. Automated algorithms may facilitate identification of patients and/or providers most likely to need more intensive screening and/or intervention for nonmedical opioid use. Additional implementation research in real-world settings would clarify their utility. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks
Li, Min; Chen, Weijie; Wang, Jianxin; Pan, Yi
2014-01-01
Identification of protein complexes from protein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived. The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures. PMID:24963481
Discovering protein complexes in protein interaction networks via exploring the weak ties effect
2012-01-01
Background Studying protein complexes is very important in biological processes since it helps reveal the structure-functionality relationships in biological networks and much attention has been paid to accurately predict protein complexes from the increasing amount of protein-protein interaction (PPI) data. Most of the available algorithms are based on the assumption that dense subgraphs correspond to complexes, failing to take into account the inherence organization within protein complex and the roles of edges. Thus, there is a critical need to investigate the possibility of discovering protein complexes using the topological information hidden in edges. Results To provide an investigation of the roles of edges in PPI networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connectivity, indicating the weak ties phenomenon in PPI networks. We further demonstrate that there is a negative relation between the weak tie strength and the topological similarity. By using the bridges, a reliable virtual network is constructed, in which each maximal clique corresponds to the core of a complex. By this notion, the detection of the protein complexes is transformed into a classic all-clique problem. A novel core-attachment based method is developed, which detects the cores and attachments, respectively. A comprehensive comparison among the existing algorithms and our algorithm has been made by comparing the predicted complexes against benchmark complexes. Conclusions We proved that the weak tie effect exists in the PPI network and demonstrated that the density is insufficient to characterize the topological structure of protein complexes. Furthermore, the experimental results on the yeast PPI network show that the proposed method outperforms the state-of-the-art algorithms. The analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes. PMID:23046740
Aligning Biomolecular Networks Using Modular Graph Kernels
NASA Astrophysics Data System (ADS)
Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant
Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.
Identification of residue pairing in interacting β-strands from a predicted residue contact map.
Mao, Wenzhi; Wang, Tong; Zhang, Wenxuan; Gong, Haipeng
2018-04-19
Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map. Our algorithm RDb 2 C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb 2 C. Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins. All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C .
SnapDock—template-based docking by Geometric Hashing
Estrin, Michael; Wolfson, Haim J.
2017-01-01
Abstract Motivation: A highly efficient template-based protein–protein docking algorithm, nicknamed SnapDock, is presented. It employs a Geometric Hashing-based structural alignment scheme to align the target proteins to the interfaces of non-redundant protein–protein interface libraries. Docking of a pair of proteins utilizing the 22 600 interface PIFACE library is performed in < 2 min on the average. A flexible version of the algorithm allowing hinge motion in one of the proteins is presented as well. Results: To evaluate the performance of the algorithm a blind re-modelling of 3547 PDB complexes, which have been uploaded after the PIFACE publication has been performed with success ratio of about 35%. Interestingly, a similar experiment with the template free PatchDock docking algorithm yielded a success rate of about 23% with roughly 1/3 of the solutions different from those of SnapDock. Consequently, the combination of the two methods gave a 42% success ratio. Availability and implementation: A web server of the application is under development. Contact: michaelestrin@gmail.com or wolfson@tau.ac.il PMID:28881968
Queiroz, Glória; Quintas, Clara; Talaia, Carlos; Gonçalves, Jorge
2004-08-01
In the prostatic portion of rat vas deferens, the non-selective adenosine receptor agonist NECA (0.1-30 microM), but not the A(2A) agonist CGS 21680 (0.001-10 microM), caused a facilitation of electrically evoked noradrenaline release (up to 43 +/- 4%), when inhibitory adenosine A(1) receptors were blocked. NECA-elicited facilitation of noradrenaline release was prevented by the A(2B) receptor-antagonist MRS 1754, enhanced by preventing cyclic-AMP degradation with rolipram, abolished by the protein kinase A inhibitors H-89, KT 5720 and cyclic-AMPS-Rp and attenuated by the protein kinase C inhibitors Ro 32-0432 and calphostin C. The adenosine uptake inhibitor NBTI also elicited a facilitation of noradrenaline release; an effect that was abolished by adenosine deaminase and attenuated by MRS 1754, by inhibitors of the extracellular nucleotide metabolism and by blockade of alpha(1)-adrenoceptors and P2X receptors with prazosin and NF023, respectively. It was concluded that adenosine A(2B) receptors are involved in a facilitation of noradrenaline release in the prostatic portion of rat vas deferens that can be activated by adenosine formed by extracellular catabolism of nucleotides. The receptors seem to be coupled to the adenylyl cyclase-protein kinase A pathway but activation of the protein kinase C by protein kinase A, may also contribute to the adenosine A(2B) receptor-mediated facilitation of noradrenaline release.
Andrews, J O; Conway, W; Cho, W -K; Narayanan, A; Spille, J -H; Jayanth, N; Inoue, T; Mullen, S; Thaler, J; Cissé, I I
2018-05-09
We present qSR, an analytical tool for the quantitative analysis of single molecule based super-resolution data. The software is created as an open-source platform integrating multiple algorithms for rigorous spatial and temporal characterizations of protein clusters in super-resolution data of living cells. First, we illustrate qSR using a sample live cell data of RNA Polymerase II (Pol II) as an example of highly dynamic sub-diffractive clusters. Then we utilize qSR to investigate the organization and dynamics of endogenous RNA Polymerase I (Pol I) in live human cells, throughout the cell cycle. Our analysis reveals a previously uncharacterized transient clustering of Pol I. Both stable and transient populations of Pol I clusters co-exist in individual living cells, and their relative fraction vary during cell cycle, in a manner correlating with global gene expression. Thus, qSR serves to facilitate the study of protein organization and dynamics with very high spatial and temporal resolutions directly in live cell.
Ma, Cheng-Wei; Xiu, Zhi-Long; Zeng, An-Ping
2012-01-01
A novel approach to reveal intramolecular signal transduction network is proposed in this work. To this end, a new algorithm of network construction is developed, which is based on a new protein dynamics model of energy dissipation. A key feature of this approach is that direction information is specified after inferring protein residue-residue interaction network involved in the process of signal transduction. This enables fundamental analysis of the regulation hierarchy and identification of regulation hubs of the signaling network. A well-studied allosteric enzyme, E. coli aspartokinase III, is used as a model system to demonstrate the new method. Comparison with experimental results shows that the new approach is able to predict all the sites that have been experimentally proved to desensitize allosteric regulation of the enzyme. In addition, the signal transduction network shows a clear preference for specific structural regions, secondary structural types and residue conservation. Occurrence of super-hubs in the network indicates that allosteric regulation tends to gather residues with high connection ability to collectively facilitate the signaling process. Furthermore, a new parameter of propagation coefficient is defined to determine the propagation capability of residues within a signal transduction network. In conclusion, the new approach is useful for fundamental understanding of the process of intramolecular signal transduction and thus has significant impact on rational design of novel allosteric proteins. PMID:22363664
An integrative approach to inferring biologically meaningful gene modules.
Cho, Ji-Hoon; Wang, Kai; Galas, David J
2011-07-26
The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
Li, Fuyi; Li, Chen; Marquez-Lago, Tatiana T; Leier, André; Akutsu, Tatsuya; Purcell, Anthony W; Smith, A Ian; Lithgow, Trevor; Daly, Roger J; Song, Jiangning; Chou, Kuo-Chen
2018-06-27
Kinase-regulated phosphorylation is a ubiquitous type of post-translational modification (PTM) in both eukaryotic and prokaryotic cells. Phosphorylation plays fundamental roles in many signalling pathways and biological processes, such as protein degradation and protein-protein interactions. Experimental studies have revealed that signalling defects caused by aberrant phosphorylation are highly associated with a variety of human diseases, especially cancers. In light of this, a number of computational methods aiming to accurately predict protein kinase family-specific or kinase-specific phosphorylation sites have been established, thereby facilitating phosphoproteomic data analysis. In this work, we present Quokka, a novel bioinformatics tool that allows users to rapidly and accurately identify human kinase family-regulated phosphorylation sites. Quokka was developed by using a variety of sequence scoring functions combined with an optimized logistic regression algorithm. We evaluated Quokka based on well-prepared up-to-date benchmark and independent test datasets, curated from the Phospho.ELM and UniProt databases, respectively. The independent test demonstrates that Quokka improves the prediction performance compared with state-of-the-art computational tools for phosphorylation prediction. In summary, our tool provides users with high-quality predicted human phosphorylation sites for hypothesis generation and biological validation. The Quokka webserver and datasets are freely available at http://quokka.erc.monash.edu/. Supplementary data are available at Bioinformatics online.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-01-01
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R
2016-06-01
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Simulation studies of the fidelity of biomolecular structure ensemble recreation
NASA Astrophysics Data System (ADS)
Lätzer, Joachim; Eastwood, Michael P.; Wolynes, Peter G.
2006-12-01
We examine the ability of Bayesian methods to recreate structural ensembles for partially folded molecules from averaged data. Specifically we test the ability of various algorithms to recreate different transition state ensembles for folding proteins using a multiple replica simulation algorithm using input from "gold standard" reference ensembles that were first generated with a Gō-like Hamiltonian having nonpairwise additive terms. A set of low resolution data, which function as the "experimental" ϕ values, were first constructed from this reference ensemble. The resulting ϕ values were then treated as one would treat laboratory experimental data and were used as input in the replica reconstruction algorithm. The resulting ensembles of structures obtained by the replica algorithm were compared to the gold standard reference ensemble, from which those "data" were, in fact, obtained. It is found that for a unimodal transition state ensemble with a low barrier, the multiple replica algorithm does recreate the reference ensemble fairly successfully when no experimental error is assumed. The Kolmogorov-Smirnov test as well as principal component analysis show that the overlap of the recovered and reference ensembles is significantly enhanced when multiple replicas are used. Reduction of the multiple replica ensembles by clustering successfully yields subensembles with close similarity to the reference ensembles. On the other hand, for a high barrier transition state with two distinct transition state ensembles, the single replica algorithm only samples a few structures of one of the reference ensemble basins. This is due to the fact that the ϕ values are intrinsically ensemble averaged quantities. The replica algorithm with multiple copies does sample both reference ensemble basins. In contrast to the single replica case, the multiple replicas are constrained to reproduce the average ϕ values, but allow fluctuations in ϕ for each individual copy. These fluctuations facilitate a more faithful sampling of the reference ensemble basins. Finally, we test how robustly the reconstruction algorithm can function by introducing errors in ϕ comparable in magnitude to those suggested by some authors. In this circumstance we observe that the chances of ensemble recovery with the replica algorithm are poor using a single replica, but are improved when multiple copies are used. A multimodal transition state ensemble, however, turns out to be more sensitive to large errors in ϕ (if appropriately gauged) and attempts at successful recreation of the reference ensemble with simple replica algorithms can fall short.
A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins
Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.
2013-01-01
The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595
Automated Discovery of Long Intergenic RNAs Associated with Breast Cancer Progression
2012-02-01
manuscript in preparation), (2) development and publication of an algorithm for detecting gene fusions in RNA-Seq data [1], and (3) discovery of outlier long...subjected to de novo assembly algorithms to discover novel transcripts representing either unannotated genes or novel somatic mutations such as gene...fusions. To this end the P.I. developed and published a novel algorithm called ChimeraScan to facilitate the discovery and validation of gene
GRID: a high-resolution protein structure refinement algorithm.
Chitsaz, Mohsen; Mayo, Stephen L
2013-03-05
The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. Copyright © 2012 Wiley Periodicals, Inc.
Facilitated release of substrate protein from prefoldin by chaperonin.
Zako, Tamotsu; Iizuka, Ryo; Okochi, Mina; Nomura, Tomoko; Ueno, Taro; Tadakuma, Hisashi; Yohda, Masafumi; Funatsu, Takashi
2005-07-04
Prefoldin is a chaperone that captures a protein-folding intermediate and transfers it to the group II chaperonin for correct folding. However, kinetics of interactions between prefoldin and substrate proteins have not been investigated. In this study, dissociation constants and dissociation rate constants of unfolded proteins with prefoldin were firstly measured using fluorescence microscopy. Our results suggest that binding and release of prefoldin from hyperthermophilic archaea with substrate proteins were in a dynamic equilibrium. Interestingly, the release of substrate proteins from prefoldin was facilitated when chaperonin was present, supporting a handoff mechanism of substrate proteins from prefoldin to the chaperonin.
PGCA: An algorithm to link protein groups created from MS/MS data
Sasaki, Mayu; Hollander, Zsuzsanna; Smith, Derek; McManus, Bruce; McMaster, W. Robert; Ng, Raymond T.; Cohen Freue, Gabriela V.
2017-01-01
The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups’ identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs. PMID:28562641
Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.
Natarajan, Shreedhar; Jakobsson, Eric
2009-06-12
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.
Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.
Pellegrini, Marco
2015-01-01
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Functional Equivalency Inferred from “Authoritative Sources” in Networks of Homologous Proteins
Natarajan, Shreedhar; Jakobsson, Eric
2009-01-01
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods. PMID:19521530
Racemic & quasi-racemic protein crystallography enabled by chemical protein synthesis.
Kent, Stephen Bh
2018-04-04
A racemic protein mixture can be used to form centrosymmetric crystals for structure determination by X-ray diffraction. Both the unnatural d-protein and the corresponding natural l-protein are made by total chemical synthesis based on native chemical ligation-chemoselective condensation of unprotected synthetic peptide segments. Racemic protein crystallography is important for structure determination of the many natural protein molecules that are refractory to crystallization. Racemic mixtures facilitate the crystallization of recalcitrant proteins, and give diffraction-quality crystals. Quasi-racemic crystallization, using a single d-protein molecule, can facilitate the determination of the structures of a series of l-protein analog molecules. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein Structure Prediction with Evolutionary Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hart, W.E.; Krasnogor, N.; Pelta, D.A.
1999-02-08
Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.
Algorithm to find distant repeats in a single protein sequence
Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj
2008-01-01
Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663
Moghadasi, Mohammad; Kozakov, Dima; Mamonov, Artem B.; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.
2013-01-01
We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP. PMID:23515575
Targeted Feature Detection for Data-Dependent Shotgun Proteomics
2017-01-01
Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser, Hendrik; Choudhary, Jyoti S
2017-08-04
Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Qualls, Joseph; Russomanno, David J.
2011-01-01
The lack of knowledge models to represent sensor systems, algorithms, and missions makes opportunistically discovering a synthesis of systems and algorithms that can satisfy high-level mission specifications impractical. A novel ontological problem-solving framework has been designed that leverages knowledge models describing sensors, algorithms, and high-level missions to facilitate automated inference of assigning systems to subtasks that may satisfy a given mission specification. To demonstrate the efficacy of the ontological problem-solving architecture, a family of persistence surveillance sensor systems and algorithms has been instantiated in a prototype environment to demonstrate the assignment of systems to subtasks of high-level missions. PMID:22164081
Current algorithmic solutions for peptide-based proteomics data generation and identification.
Hoopmann, Michael R; Moritz, Robert L
2013-02-01
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics. Copyright © 2012 Elsevier Ltd. All rights reserved.
Al Nasr, Kamal; Ranjan, Desh; Zubair, Mohammad; Chen, Lin; He, Jing
2014-01-01
Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ(2)N(2)2(N)) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.
Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S
2017-01-01
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Wang, Rui-Sheng; Loscalzo, Joseph
2018-05-20
Understanding the genetic basis of complex diseases is challenging. Prior work shows that disease-related proteins do not typically function in isolation. Rather, they often interact with each other to form a network module that underlies dysfunctional mechanistic pathways. Identifying such disease modules will provide insights into a systems-level understanding of molecular mechanisms of diseases. Owing to the incompleteness of our knowledge of disease proteins and limited information on the biological mediators of pathobiological processes, the key proteins (seed proteins) for many diseases appear scattered over the human protein-protein interactome and form a few small branches, rather than coherent network modules. In this paper, we develop a network-based algorithm, called the Seed Connector algorithm (SCA), to pinpoint disease modules by adding as few additional linking proteins (seed connectors) to the seed protein pool as possible. Such seed connectors are hidden disease module elements that are critical for interpreting the functional context of disease proteins. The SCA aims to connect seed disease proteins so that disease mechanisms and pathways can be decoded based on predicted coherent network modules. We validate the algorithm using a large corpus of 70 complex diseases and binding targets of over 200 drugs, and demonstrate the biological relevance of the seed connectors. Lastly, as a specific proof of concept, we apply the SCA to a set of seed proteins for coronary artery disease derived from a meta-analysis of large-scale genome-wide association studies and obtain a coronary artery disease module enriched with important disease-related signaling pathways and drug targets not previously recognized. Copyright © 2018 Elsevier Ltd. All rights reserved.
Interactive Learning Environment for Bio-Inspired Optimization Algorithms for UAV Path Planning
ERIC Educational Resources Information Center
Duan, Haibin; Li, Pei; Shi, Yuhui; Zhang, Xiangyin; Sun, Changhao
2015-01-01
This paper describes the development of BOLE, a MATLAB-based interactive learning environment, that facilitates the process of learning bio-inspired optimization algorithms, and that is dedicated exclusively to unmanned aerial vehicle path planning. As a complement to conventional teaching methods, BOLE is designed to help students consolidate the…
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
Bousquet, P-J; Caillet, P; Coeuret-Pellicer, M; Goulard, H; Kudjawu, Y C; Le Bihan, C; Lecuyer, A I; Séguret, F
2017-10-01
The development and use of healthcare databases accentuates the need for dedicated tools, including validated selection algorithms of cancer diseased patients. As part of the development of the French National Health Insurance System data network REDSIAM, the tumor taskforce established an inventory of national and internal published algorithms in the field of cancer. This work aims to facilitate the choice of a best-suited algorithm. A non-systematic literature search was conducted for various cancers. Results are presented for lung, breast, colon, and rectum. Medline, Scopus, the French Database in Public Health, Google Scholar, and the summaries of the main French journals in oncology and public health were searched for publications until August 2016. An extraction grid adapted to oncology was constructed and used for the extraction process. A total of 18 publications were selected for lung cancer, 18 for breast cancer, and 12 for colorectal cancer. Validation studies of algorithms are scarce. When information is available, the performance and choice of an algorithm are dependent on the context, purpose, and location of the planned study. Accounting for cancer disease specificity, the proposed extraction chart is more detailed than the generic chart developed for other REDSIAM taskforces, but remains easily usable in practice. This study illustrates the complexity of cancer detection through sole reliance on healthcare databases and the lack of validated algorithms specifically designed for this purpose. Studies that standardize and facilitate validation of these algorithms should be developed and promoted. Copyright © 2017. Published by Elsevier Masson SAS.
NASA Astrophysics Data System (ADS)
Cheng, Jun-Hu; Jin, Huali; Liu, Zhiwei
2018-01-01
The feasibility of developing a multispectral imaging method using important wavelengths from hyperspectral images selected by genetic algorithm (GA), successive projection algorithm (SPA) and regression coefficient (RC) methods for modeling and predicting protein content in peanut kernel was investigated for the first time. Partial least squares regression (PLSR) calibration model was established between the spectral data from the selected optimal wavelengths and the reference measured protein content ranged from 23.46% to 28.43%. The RC-PLSR model established using eight key wavelengths (1153, 1567, 1972, 2143, 2288, 2339, 2389 and 2446 nm) showed the best predictive results with the coefficient of determination of prediction (R2P) of 0.901, and root mean square error of prediction (RMSEP) of 0.108 and residual predictive deviation (RPD) of 2.32. Based on the obtained best model and image processing algorithms, the distribution maps of protein content were generated. The overall results of this study indicated that developing a rapid and online multispectral imaging system using the feature wavelengths and PLSR analysis is potential and feasible for determination of the protein content in peanut kernels.
The contour-buildup algorithm to calculate the analytical molecular surface.
Totrov, M; Abagyan, R
1996-01-01
A new algorithm is presented to calculate the analytical molecular surface defined as a smooth envelope traced out by the surface of a probe sphere rolled over the molecule. The core of the algorithm is the sequential build up of multi-arc contours on the van der Waals spheres. This algorithm yields substantial reduction in both memory and time requirements of surface calculations. Further, the contour-buildup principle is intrinsically "local", which makes calculations of the partial molecular surfaces even more efficient. Additionally, the algorithm is equally applicable not only to convex patches, but also to concave triangular patches which may have complex multiple intersections. The algorithm permits the rigorous calculation of the full analytical molecular surface for a 100-residue protein in about 2 seconds on an SGI indigo with R4400++ processor at 150 Mhz, with the performance scaling almost linearly with the protein size. The contour-buildup algorithm is faster than the original Connolly algorithm an order of magnitude.
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Challenges and Opportunities for Harmonizing Research Methodology: Raw Accelerometry.
van Hees, Vincent T; Thaler-Kall, Kathrin; Wolf, Klaus-Hendrik; Brønd, Jan C; Bonomi, Alberto; Schulze, Mareike; Vigl, Matthäus; Morseth, Bente; Hopstock, Laila Arnesdatter; Gorzelniak, Lukas; Schulz, Holger; Brage, Søren; Horsch, Alexander
2016-12-07
Raw accelerometry is increasingly being used in physical activity research, but diversity in sensor design, attachment and signal processing challenges the comparability of research results. Therefore, efforts are needed to harmonize the methodology. In this article we reflect on how increased methodological harmonization may be achieved. The authors of this work convened for a two-day workshop (March 2014) themed on methodological harmonization of raw accelerometry. The discussions at the workshop were used as a basis for this review. Key stakeholders were identified as manufacturers, method developers, method users (application), publishers, and funders. To facilitate methodological harmonization in raw accelerometry the following action points were proposed: i) Manufacturers are encouraged to provide a detailed specification of their sensors, ii) Each fundamental step of algorithms for processing raw accelerometer data should be documented, and ideally also motivated, to facilitate interpretation and discussion, iii) Algorithm developers and method users should be open about uncertainties in the description of data and the uncertainty of the inference itself, iv) All new algorithms which are pitched as "ready for implementation" should be shared with the community to facilitate replication and ongoing evaluation by independent groups, and v) A dynamic interaction between method stakeholders should be encouraged to facilitate a well-informed harmonization process. The workshop led to the identification of a number of opportunities for harmonizing methodological practice. The discussion as well as the practical checklists proposed in this review should provide guidance for stakeholders on how to contribute to increased harmonization.
NASA Astrophysics Data System (ADS)
Zuliyana, Nia; Suseno, Jatmiko Endro; Adi, Kusworo
2018-02-01
Composition of foods containing sugar in people with Diabetes Mellitus should be balanced, so an app is required for facilitate the public and nutritionists in determining the appropriate food menu with calorie requirement of diabetes patient. This research will be recommended to determination of food variation for using Genetic Algorithm. The data used is nutrient content of food obtained from Tabel Komposisi Pangan Indonesia (TKPI). The requirement of caloric value the patient can be used the PERKENI 2015 method. Then the data is processed to determine the best food menu consisting of energy (E), carbohydrate (K), fat (L) and protein (P) requirements. The system is comparised with variation of Genetic Algorithm parameters is the total of chromosomes, Probability of Crossover (Pc) and Probability of Mutation (Pm). Maximum value of the probability generation of crossover and probability of mutation will be the more variations of food that will come out. For example, patient with gender is women aged 61 years old, height 160 cm, weight 55 kg, will be resulted number of calories: (E=1621.4, K=243.21, P=60.80, L=45.04), with the gene=4, chromosomes=3, generation=3, Pc=0.2, and Pm=0.2. The result obtained is the three varians: E=1607.25, K=198.877, P=95.385, L=47.508), (E=1633.25, K=196.677, P=85.885, L=55.758), (E=1630.90, K=177.455, P=85.245, L=64.335).
MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.
Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su
2007-03-16
Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.
The Correlation Fractal Dimension of Complex Networks
NASA Astrophysics Data System (ADS)
Wang, Xingyuan; Liu, Zhenzhen; Wang, Mogei
2013-05-01
The fractality of complex networks is studied by estimating the correlation dimensions of the networks. Comparing with the previous algorithms of estimating the box dimension, our algorithm achieves a significant reduction in time complexity. For four benchmark cases tested, that is, the Escherichia coli (E. Coli) metabolic network, the Homo sapiens protein interaction network (H. Sapiens PIN), the Saccharomyces cerevisiae protein interaction network (S. Cerevisiae PIN) and the World Wide Web (WWW), experiments are provided to demonstrate the validity of our algorithm.
ICPD-a new peak detection algorithm for LC/MS.
Zhang, Jianqiu; Haskins, William
2010-12-01
The identification and quantification of proteins using label-free Liquid Chromatography/Mass Spectrometry (LC/MS) play crucial roles in biological and biomedical research. Increasing evidence has shown that biomarkers are often low abundance proteins. However, LC/MS systems are subject to considerable noise and sample variability, whose statistical characteristics are still elusive, making computational identification of low abundance proteins extremely challenging. As a result, the inability of identifying low abundance proteins in a proteomic study is the main bottleneck in protein biomarker discovery. In this paper, we propose a new peak detection method called Information Combining Peak Detection (ICPD ) for high resolution LC/MS. In LC/MS, peptides elute during a certain time period and as a result, peptide isotope patterns are registered in multiple MS scans. The key feature of the new algorithm is that the observed isotope patterns registered in multiple scans are combined together for estimating the likelihood of the peptide existence. An isotope pattern matching score based on the likelihood probability is provided and utilized for peak detection. The performance of the new algorithm is evaluated based on protein standards with 48 known proteins. The evaluation shows better peak detection accuracy for low abundance proteins than other LC/MS peak detection methods.
A Markov Random Field Framework for Protein Side-Chain Resonance Assignment
NASA Astrophysics Data System (ADS)
Zeng, Jianyang; Zhou, Pei; Donald, Bruce Randall
Nuclear magnetic resonance (NMR) spectroscopy plays a critical role in structural genomics, and serves as a primary tool for determining protein structures, dynamics and interactions in physiologically-relevant solution conditions. The current speed of protein structure determination via NMR is limited by the lengthy time required in resonance assignment, which maps spectral peaks to specific atoms and residues in the primary sequence. Although numerous algorithms have been developed to address the backbone resonance assignment problem [68,2,10,37,14,64,1,31,60], little work has been done to automate side-chain resonance assignment [43, 48, 5]. Most previous attempts in assigning side-chain resonances depend on a set of NMR experiments that record through-bond interactions with side-chain protons for each residue. Unfortunately, these NMR experiments have low sensitivity and limited performance on large proteins, which makes it difficult to obtain enough side-chain resonance assignments. On the other hand, it is essential to obtain almost all of the side-chain resonance assignments as a prerequisite for high-resolution structure determination. To overcome this deficiency, we present a novel side-chain resonance assignment algorithm based on alternative NMR experiments measuring through-space interactions between protons in the protein, which also provide crucial distance restraints and are normally required in high-resolution structure determination. We cast the side-chain resonance assignment problem into a Markov Random Field (MRF) framework, and extend and apply combinatorial protein design algorithms to compute the optimal solution that best interprets the NMR data. Our MRF framework captures the contact map information of the protein derived from NMR spectra, and exploits the structural information available from the backbone conformations determined by orientational restraints and a set of discretized side-chain conformations (i.e., rotamers). A Hausdorff-based computation is employed in the scoring function to evaluate the probability of side-chain resonance assignments to generate the observed NMR spectra. The complexity of the assignment problem is first reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is used to find a set of optimal side-chain resonance assignments that best fit the NMR data. We have tested our algorithm on NMR data for five proteins, including the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI). Our algorithm assigns resonances for more than 90% of the protons in the proteins, and achieves about 80% correct side-chain resonance assignments. The final structures computed using distance restraints resulting from the set of assigned side-chain resonances have backbone RMSD 0.5 - 1.4 Å and all-heavy-atom RMSD 1.0 - 2.2 Å from the reference structures that were determined by X-ray crystallography or traditional NMR approaches. These results demonstrate that our algorithm can be successfully applied to automate side-chain resonance assignment and high-quality protein structure determination. Since our algorithm does not require any specific NMR experiments for measuring the through-bond interactions with side-chain protons, it can save a significant amount of both experimental cost and spectrometer time, and hence accelerate the NMR structure determination process.
Li, Bai; Lin, Mu; Liu, Qiao; Li, Ya; Zhou, Changjun
2015-10-01
Protein folding is a fundamental topic in molecular biology. Conventional experimental techniques for protein structure identification or protein folding recognition require strict laboratory requirements and heavy operating burdens, which have largely limited their applications. Alternatively, computer-aided techniques have been developed to optimize protein structures or to predict the protein folding process. In this paper, we utilize a 3D off-lattice model to describe the original protein folding scheme as a simplified energy-optimal numerical problem, where all types of amino acid residues are binarized into hydrophobic and hydrophilic ones. We apply a balance-evolution artificial bee colony (BE-ABC) algorithm as the minimization solver, which is featured by the adaptive adjustment of search intensity to cater for the varying needs during the entire optimization process. In this work, we establish a benchmark case set with 13 real protein sequences from the Protein Data Bank database and evaluate the convergence performance of BE-ABC algorithm through strict comparisons with several state-of-the-art ABC variants in short-term numerical experiments. Besides that, our obtained best-so-far protein structures are compared to the ones in comprehensive previous literature. This study also provides preliminary insights into how artificial intelligence techniques can be applied to reveal the dynamics of protein folding. Graphical Abstract Protein folding optimization using 3D off-lattice model and advanced optimization techniques.
Clinical algorithms to aid osteoarthritis guideline dissemination.
Meneses, S R F; Goode, A P; Nelson, A E; Lin, J; Jordan, J M; Allen, K D; Bennell, K L; Lohmander, L S; Fernandes, L; Hochberg, M C; Underwood, M; Conaghan, P G; Liu, S; McAlindon, T E; Golightly, Y M; Hunter, D J
2016-09-01
Numerous scientific organisations have developed evidence-based recommendations aiming to optimise the management of osteoarthritis (OA). Uptake, however, has been suboptimal. The purpose of this exercise was to harmonize the recent recommendations and develop a user-friendly treatment algorithm to facilitate translation of evidence into practice. We updated a previous systematic review on clinical practice guidelines (CPGs) for OA management. The guidelines were assessed using the Appraisal of Guidelines for Research and Evaluation for quality and the standards for developing trustworthy CPGs as established by the National Academy of Medicine (NAM). Four case scenarios and algorithms were developed by consensus of a multidisciplinary panel. Sixteen guidelines were included in the systematic review. Most recommendations were directed toward physicians and allied health professionals, and most had multi-disciplinary input. Analysis for trustworthiness suggests that many guidelines still present a lack of transparency. A treatment algorithm was developed for each case scenario advised by recommendations from guidelines and based on panel consensus. Strategies to facilitate the implementation of guidelines in clinical practice are necessary. The algorithms proposed are examples of how to apply recommendations in the clinical context, helping the clinician to visualise the patient flow and timing of different treatment modalities. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Karagiannis, P.; Markelis, I.; Paparrizos, K.; Samaras, N.; Sifaleras, A.
2006-01-01
This paper presents new web-based educational software (webNetPro) for "Linear Network Programming." It includes many algorithms for "Network Optimization" problems, such as shortest path problems, minimum spanning tree problems, maximum flow problems and other search algorithms. Therefore, webNetPro can assist the teaching process of courses such…
Rabow, A. A.; Scheraga, H. A.
1996-01-01
We have devised a Cartesian combination operator and coding scheme for improving the performance of genetic algorithms applied to the protein folding problem. The genetic coding consists of the C alpha Cartesian coordinates of the protein chain. The recombination of the genes of the parents is accomplished by: (1) a rigid superposition of one parent chain on the other, to make the relation of Cartesian coordinates meaningful, then, (2) the chains of the children are formed through a linear combination of the coordinates of their parents. The children produced with this Cartesian combination operator scheme have similar topology and retain the long-range contacts of their parents. The new scheme is significantly more efficient than the standard genetic algorithm methods for locating low-energy conformations of proteins. The considerable superiority of genetic algorithms over Monte Carlo optimization methods is also demonstrated. We have also devised a new dynamic programming lattice fitting procedure for use with the Cartesian combination operator method. The procedure finds excellent fits of real-space chains to the lattice while satisfying bond-length, bond-angle, and overlap constraints. PMID:8880904
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi
2017-03-15
Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Querying graphs in protein-protein interactions networks using feedback vertex set.
Blin, Guillaume; Sikora, Florian; Vialette, Stéphane
2010-01-01
Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.
Konc, Janez; Janežič, Dušanka
2017-09-01
ProBiS (Protein Binding Sites) Tools consist of algorithm, database, and web servers for prediction of binding sites and protein ligands based on the detection of structurally similar binding sites in the Protein Data Bank. In this article, we review the operations that ProBiS Tools perform, provide comments on the evolution of the tools, and give some implementation details. We review some of its applications to biologically interesting proteins. ProBiS Tools are freely available at http://probis.cmm.ki.si and http://probis.nih.gov. Copyright © 2017 Elsevier Ltd. All rights reserved.
Shen, Xianjun; Yi, Li; Jiang, Xingpeng; He, Tingting; Yang, Jincai; Xie, Wei; Hu, Po; Hu, Xiaohua
2017-01-01
How to identify protein complex is an important and challenging task in proteomics. It would make great contribution to our knowledge of molecular mechanism in cell life activities. However, the inherent organization and dynamic characteristic of cell system have rarely been incorporated into the existing algorithms for detecting protein complexes because of the limitation of protein-protein interaction (PPI) data produced by high throughput techniques. The availability of time course gene expression profile enables us to uncover the dynamics of molecular networks and improve the detection of protein complexes. In order to achieve this goal, this paper proposes a novel algorithm DCA (Dynamic Core-Attachment). It detects protein-complex core comprising of continually expressed and highly connected proteins in dynamic PPI network, and then the protein complex is formed by including the attachments with high adhesion into the core. The integration of core-attachment feature into the dynamic PPI network is responsible for the superiority of our algorithm. DCA has been applied on two different yeast dynamic PPI networks and the experimental results show that it performs significantly better than the state-of-the-art techniques in terms of prediction accuracy, hF-measure and statistical significance in biology. In addition, the identified complexes with strong biological significance provide potential candidate complexes for biologists to validate.
Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L; Hemmatzadeh, Farhid; Ebrahimie, Esmaeil
2014-01-01
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L.
2014-01-01
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics. PMID:24809455
He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei
2012-06-25
Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.
Ramakrishnan, Gayatri; Ochoa-Montaño, Bernardo; Raghavender, Upadhyayula S; Mudgal, Richa; Joshi, Adwait G; Chandra, Nagasuma R; Sowdhamini, Ramanathan; Blundell, Tom L; Srinivasan, Narayanaswamy
2015-01-01
The availability of the genome sequence of Mycobacterium tuberculosis H37Rv has encouraged determination of large numbers of protein structures and detailed definition of the biological information encoded therein; yet, the functions of many proteins in M. tuberculosis remain unknown. The emergence of multidrug resistant strains makes it a priority to exploit recent advances in homology recognition and structure prediction to re-analyse its gene products. Here we report the structural and functional characterization of gene products encoded in the M. tuberculosis genome, with the help of sensitive profile-based remote homology search and fold recognition algorithms resulting in an enhanced annotation of the proteome where 95% of the M. tuberculosis proteins were identified wholly or partly with information on structure or function. New information includes association of 244 proteins with 205 domain families and a separate set of new association of folds to 64 proteins. Extending structural information across uncharacterized protein families represented in the M. tuberculosis proteome, by determining superfamily relationships between families of known and unknown structures, has contributed to an enhancement in the knowledge of structural content. In retrospect, such superfamily relationships have facilitated recognition of probable structure and/or function for several uncharacterized protein families, eventually aiding recognition of probable functions for homologous proteins corresponding to such families. Gene products unique to mycobacteria for which no functions could be identified are 183. Of these 18 were determined to be M. tuberculosis specific. Such pathogen-specific proteins are speculated to harbour virulence factors required for pathogenesis. A re-annotated proteome of M. tuberculosis, with greater completeness of annotated proteins and domain assigned regions, provides a valuable basis for experimental endeavours designed to obtain a better understanding of pathogenesis and to accelerate the process of drug target discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.
SIFTER search: a web server for accurate phylogeny-based protein function prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana
2016-01-01
With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.
SIFTER search: a web server for accurate phylogeny-based protein function prediction
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
2015-05-15
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
Berlin, Konstantin; Longhini, Andrew; Dayie, T. Kwaku; Fushman, David
2013-01-01
To facilitate rigorous analysis of molecular motions in proteins, DNA, and RNA, we present a new version of ROTDIF, a program for determining the overall rotational diffusion tensor from single-or multiple-field Nuclear Magnetic Resonance (NMR) relaxation data. We introduce four major features that expand the program’s versatility and usability. The first feature is the ability to analyze, separately or together, 13C and/or 15N relaxation data collected at a single or multiple fields. A significant improvement in the accuracy compared to direct analysis of R2/R1 ratios, especially critical for analysis of 13C relaxation data, is achieved by subtracting high-frequency contributions to relaxation rates. The second new feature is an improved method for computing the rotational diffusion tensor in the presence of biased errors, such as large conformational exchange contributions, that significantly enhances the accuracy of the computation. The third new feature is the integration of the domain alignment and docking module for relaxation-based structure determination of multi-domain systems. Finally, to improve accessibility to all the program features, we introduced a graphical user interface (GUI) that simplifies and speeds up the analysis of the data. Written in Java, the new ROTDIF can run on virtually any computer platform. In addition, the new ROTDIF achieves an order of magnitude speedup over the previous version by implementing a more efficient deterministic minimization algorithm. We not only demonstrate the improvement in accuracy and speed of the new algorithm for synthetic and experimental 13C and 15N relaxation data for several proteins and nucleic acids, but also show that careful analysis required especially for characterizing RNA dynamics allowed us to uncover subtle conformational changes in RNA as a function of temperature that were opaque to previous analysis. PMID:24170368
Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins
2013-01-01
Background Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only. Results In this paper, we propose a preprocessing algorithm that can be used for grouping technical aliases in mass spectrometry protein profiling data. The stringency of the variance allowed for clustering is customizable, thereby affecting the number of peaks that are clustered. Subsequent analysis of the clusters, instead of individual peaks, helps reduce difficulties associated with technically-correlated data, and can aid more efficient biomarker identification. Conclusions This software can be used to pre-process and thereby decrease the complexity of protein profiling proteomics data, thus simplifying the subsequent analysis of biomarkers by decreasing the number of tests. The software is also a practical tool for identifying which features to investigate further by purification, identification and confirmation. PMID:24010718
Protein structure prediction with local adjust tabu search algorithm
2014-01-01
Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
Protein-protein docking using region-based 3D Zernike descriptors
2009-01-01
Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235
Protein-protein docking using region-based 3D Zernike descriptors.
Venkatraman, Vishwesh; Yang, Yifeng D; Sael, Lee; Kihara, Daisuke
2009-12-09
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-alphaRMSD < or = 2.5 A) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Bergold, P J; Sweatt, J D; Winicov, I; Weiss, K R; Kandel, E R; Schwartz, J H
1990-01-01
Depending on the number or the length of exposure, application of serotonin can produce either short-term or long-term presynaptic facilitation of Aplysia sensory-to-motor synapses. The cAMP-dependent protein kinase, a heterodimer of two regulatory and two catalytic subunits, has been shown to become stably activated only during long-term facilitation. Both acquisition of long-term facilitation and persistent activation of the kinase is blocked by anisomycin, an effective, reversible, and specific inhibitor of protein synthesis in Aplysia. We report here that 2-hr exposure of pleural sensory cells to serotonin lowers the concentration of regulatory subunits but does not change the concentration of catalytic subunits, as assayed 24 hr later; 5-min exposure to serotonin has no effect on either type of subunit. Increasing intracellular cAMP with a permeable analog of cAMP together with the phosphodiesterase inhibitor isobutyl methylxanthine also decreased regulatory subunits, suggesting that cAMP is the second messenger mediating serotonin action. Anisomycin blocked the loss of regulatory subunits only when applied with serotonin; application after the 2-hr treatment with serotonin had no effect. In the Aplysia accessory radula contractor muscle, prolonged exposure to serotonin or to the peptide transmitter small cardioactive peptide B, both of which produce large increases in intracellular cAMP, does not decrease regulatory subunits. This mechanism of regulating the cAMP-dependent protein kinase therefore may be specific to the nervous system. We conclude that during long-term facilitation, new protein is synthesized in response to the facilitatory stimulus, which changes the ratio of subunits of the cAMP-dependent protein kinase. This alteration in ratio could persistently activate the kinase and produce the persistent phosphorylation seen in long-term facilitated sensory cells. Images PMID:1692622
Wei, Jyh-Da; Tsai, Ming-Hung; Lee, Gen-Cher; Huang, Jeng-Hung; Lee, Der-Tsai
2009-01-01
Algorithm visualization is a unique research topic that integrates engineering skills such as computer graphics, system programming, database management, computer networks, etc., to facilitate algorithmic researchers in testing their ideas, demonstrating new findings, and teaching algorithm design in the classroom. Within the broad applications of algorithm visualization, there still remain performance issues that deserve further research, e.g., system portability, collaboration capability, and animation effect in 3D environments. Using modern technologies of Java programming, we develop an algorithm visualization and debugging system, dubbed GeoBuilder, for geometric computing. The GeoBuilder system features Java's promising portability, engagement of collaboration in algorithm development, and automatic camera positioning for tracking 3D geometric objects. In this paper, we describe the design of the GeoBuilder system and demonstrate its applications.
Lohmann, Kristina; Freigofas, Julia; Leichsenring, Julian; Wallenwein, Chantal Marie; Haefeli, Walter Emil; Seidling, Hanna Marita
2015-04-01
We aimed to develop and evaluate an algorithm to facilitate drug switching between primary and tertiary care for patients with feeding tubes. An expert consortium developed an algorithm and applied it manually to 267 preadmission drugs of 46 patients admitted to a surgical ward of a tertiary care university hospital between June 12 and December 2, 2013, and requiring a feeding tube during their inpatient stay. The new algorithm considered the following principles: Drugs should be ideally listed on the hospital drug formulary (HDF). Additionally, drugs should include the same ingredient instead of a therapeutic equivalent. Preferred dosage forms were appropriate liquids, followed by solid drugs with liquid administration form, and solid drugs that could be crushed and/or suspended. Of all evaluated drugs, 83.5% could be switched to suitable drugs listed on the HDF and another 6.0% to drugs available on the German drug market. Additionally, for 4.1% of the drugs, the integration of individual switching rules allowed the switch from enteric-coated to immediate-release drugs. Consequently, 6.4% of the drugs could not be automatically switched and required case-to-case decision by a clinical professional (e.g., from sustained-release to immediate-release). The predefined principles were successfully integrated in the new algorithm. Thus, the algorithm switched more than 90% of the evaluated preadmission drugs to suitable drugs for inpatients with feeding tubes. This finding suggests that the algorithm can readily be transferred to an electronic format and integrated into a clinical decision support system.
An integrative approach to inferring biologically meaningful gene modules
2011-01-01
Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.
Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C
2009-02-24
Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.
Pisanti, Nadia; Soldano, Henry; Carpentier, Mathilde; Pothier, Joel
2009-12-01
The geometrical configurations of atoms in protein structures can be viewed as approximate relations among them. Then, finding similar common substructures within a set of protein structures belongs to a new class of problems that generalizes that of finding repeated motifs. The novelty lies in the addition of constraints on the motifs in terms of relations that must hold between pairs of positions of the motifs. We will hence denote them as relational motifs. For this class of problems, we present an algorithm that is a suitable extension of the KMR paradigm and, in particular, of the KMRC as it uses a degenerate alphabet. Our algorithm contains several improvements that become especially useful when-as it is required for relational motifs-the inference is made by partially overlapping shorter motifs, rather than concatenating them. The efficiency, correctness and completeness of the algorithm is ensured by several non-trivial properties that are proven in this paper. The algorithm has been applied in the important field of protein common 3D substructure searching. The methods implemented have been tested on several examples of protein families such as serine proteases, globins and cytochromes P450 additionally. The detected motifs have been compared to those found by multiple structural alignments methods.
Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation
Burkoff, Nikolas S.; Várnai, Csilla; Wells, Stephen A.; Wild, David L.
2012-01-01
Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. PMID:22385859
Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi
2018-05-21
With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.
Exploring the energy landscapes of protein folding simulations with Bayesian computation.
Burkoff, Nikolas S; Várnai, Csilla; Wells, Stephen A; Wild, David L
2012-02-22
Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Application of a fast sorting algorithm to the assignment of mass spectrometric cross-linking data.
Petrotchenko, Evgeniy V; Borchers, Christoph H
2014-09-01
Cross-linking combined with MS involves enzymatic digestion of cross-linked proteins and identifying cross-linked peptides. Assignment of cross-linked peptide masses requires a search of all possible binary combinations of peptides from the cross-linked proteins' sequences, which becomes impractical with increasing complexity of the protein system and/or if digestion enzyme specificity is relaxed. Here, we describe the application of a fast sorting algorithm to search large sequence databases for cross-linked peptide assignments based on mass. This same algorithm has been used previously for assigning disulfide-bridged peptides (Choi et al., ), but has not previously been applied to cross-linking studies. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Graphical programming interface: A development environment for MRI methods.
Zwart, Nicholas R; Pipe, James G
2015-11-01
To introduce a multiplatform, Python language-based, development environment called graphical programming interface for prototyping MRI techniques. The interface allows developers to interact with their scientific algorithm prototypes visually in an event-driven environment making tasks such as parameterization, algorithm testing, data manipulation, and visualization an integrated part of the work-flow. Algorithm developers extend the built-in functionality through simple code interfaces designed to facilitate rapid implementation. This article shows several examples of algorithms developed in graphical programming interface including the non-Cartesian MR reconstruction algorithms for PROPELLER and spiral as well as spin simulation and trajectory visualization of a FLORET example. The graphical programming interface framework is shown to be a versatile prototyping environment for developing numeric algorithms used in the latest MR techniques. © 2014 Wiley Periodicals, Inc.
Zhai, Jing-Xuan; Cao, Tian-Jie; An, Ji-Yong; Bian, Yong-Tao
2017-11-07
It is a challenging task for fundamental research whether proteins can interact with their partners. Protein self-interaction (SIP) is a special case of PPIs, which plays a key role in the regulation of cellular functions. Due to the limitations of experimental self-interaction identification, it is very important to develop an effective biological tool for predicting SIPs based on protein sequences. In the study, we developed a novel computational method called RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) for detecting SIPs from protein sequences. Firstly, Average Blocks (AB) feature extraction method is employed to represent protein sequences on a Position Specific Scoring Matrix (PSSM). Secondly, Principal Component Analysis (PCA) method is used to reduce the dimension of AB vector for reducing the influence of noise. Then, by employing the Relevance Vector Machine (RVM) algorithm, the performance of RVM-AB is assessed and compared with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on yeast and human datasets respectively. Using the fivefold test experiment, RVM-AB model achieved very high accuracies of 93.01% and 97.72% on yeast and human datasets respectively, which are significantly better than the method based on SVM classifier and other previous methods. The experimental results proved that the RVM-AB prediction model is efficient and robust. It can be an automatic decision support tool for detecting SIPs. For facilitating extensive studies for future proteomics research, the RVMAB server is freely available for academic use at http://219.219.62.123:8888/SIP_AB. Copyright © 2017 Elsevier Ltd. All rights reserved.
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Nitrate transporter genes in apple and the effect of water deficit on their expression
USDA-ARS?s Scientific Manuscript database
Nitrogen transporters are members of a large superfamily, the Major Facilitator Superfamily (MFS). This family is ubiquitous and diverse, and includes proteins that facilitate the transport of a wide range of substrates across the cytoplasmic or intracellular membranes. Among the proteins encoded ...
Herrera, Lara Maria; Fernandes, Clemente Maia da Silva; Serra, Mônica da Costa
2018-01-01
This study aimed to develop and to assess an algorithm to facilitate lip print visualization, and to digitally analyze lip prints on different supports, by superimposition. It also aimed to classify lip prints according to sex. A batch image processing algorithm was developed, which facilitated the identification and extraction of information about lip grooves. However, it performed better for lip print images with a uniform background. Paper and glass slab allowed more correct identifications than glass and the both sides of compact disks. There was no significant difference between the type of support and the amount of matching structures located in the middle area of the lower lip. There was no evidence of association between types of lip grooves and sex. Lip groove patterns of type III and type I were the most common for both sexes. The development of systems for lip print analysis is necessary, mainly concerning digital methods. © 2017 American Academy of Forensic Sciences.
Armean, Irina M; Lilley, Kathryn S; Trotter, Matthew W B; Pilkington, Nicholas C V; Holden, Sean B
2018-06-01
Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. https://github.com/ima23/maxent-ppi. sbh11@cl.cam.ac.uk. Supplementary data are available at Bioinformatics online.
ICPD-A New Peak Detection Algorithm for LC/MS
2010-01-01
Background The identification and quantification of proteins using label-free Liquid Chromatography/Mass Spectrometry (LC/MS) play crucial roles in biological and biomedical research. Increasing evidence has shown that biomarkers are often low abundance proteins. However, LC/MS systems are subject to considerable noise and sample variability, whose statistical characteristics are still elusive, making computational identification of low abundance proteins extremely challenging. As a result, the inability of identifying low abundance proteins in a proteomic study is the main bottleneck in protein biomarker discovery. Results In this paper, we propose a new peak detection method called Information Combining Peak Detection (ICPD ) for high resolution LC/MS. In LC/MS, peptides elute during a certain time period and as a result, peptide isotope patterns are registered in multiple MS scans. The key feature of the new algorithm is that the observed isotope patterns registered in multiple scans are combined together for estimating the likelihood of the peptide existence. An isotope pattern matching score based on the likelihood probability is provided and utilized for peak detection. Conclusions The performance of the new algorithm is evaluated based on protein standards with 48 known proteins. The evaluation shows better peak detection accuracy for low abundance proteins than other LC/MS peak detection methods. PMID:21143790
Multi-Level Sequential Pattern Mining Based on Prime Encoding
NASA Astrophysics Data System (ADS)
Lianglei, Sun; Yun, Li; Jiang, Yin
Encoding is not only to express the hierarchical relationship, but also to facilitate the identification of the relationship between different levels, which will directly affect the efficiency of the algorithm in the area of mining the multi-level sequential pattern. In this paper, we prove that one step of division operation can decide the parent-child relationship between different levels by using prime encoding and present PMSM algorithm and CROSS-PMSM algorithm which are based on prime encoding for mining multi-level sequential pattern and cross-level sequential pattern respectively. Experimental results show that the algorithm can effectively extract multi-level and cross-level sequential pattern from the sequence database.
Hybrid approach for detection of dental caries based on the methods FCM and level sets
NASA Astrophysics Data System (ADS)
Chaabene, Marwa; Ben Ali, Ramzi; Ejbali, Ridha; Zaied, Mourad
2017-03-01
This paper presents a new technique for detection of dental caries that is a bacterial disease that destroys the tooth structure. In our approach, we have achieved a new segmentation method that combines the advantages of fuzzy C mean algorithm and level set method. The results obtained by the FCM algorithm will be used by Level sets algorithm to reduce the influence of the noise effect on the working of each of these algorithms, to facilitate level sets manipulation and to lead to more robust segmentation. The sensitivity and specificity confirm the effectiveness of proposed method for caries detection.
Tear fluid proteomics multimarkers for diabetic retinopathy screening
2013-01-01
Background The aim of the project was to develop a novel method for diabetic retinopathy screening based on the examination of tear fluid biomarker changes. In order to evaluate the usability of protein biomarkers for pre-screening purposes several different approaches were used, including machine learning algorithms. Methods All persons involved in the study had diabetes. Diabetic retinopathy (DR) was diagnosed by capturing 7-field fundus images, evaluated by two independent ophthalmologists. 165 eyes were examined (from 119 patients), 55 were diagnosed healthy and 110 images showed signs of DR. Tear samples were taken from all eyes and state-of-the-art nano-HPLC coupled ESI-MS/MS mass spectrometry protein identification was performed on all samples. Applicability of protein biomarkers was evaluated by six different optimally parameterized machine learning algorithms: Support Vector Machine, Recursive Partitioning, Random Forest, Naive Bayes, Logistic Regression, K-Nearest Neighbor. Results Out of the six investigated machine learning algorithms the result of Recursive Partitioning proved to be the most accurate. The performance of the system realizing the above algorithm reached 74% sensitivity and 48% specificity. Conclusions Protein biomarkers selected and classified with machine learning algorithms alone are at present not recommended for screening purposes because of low specificity and sensitivity values. This tool can be potentially used to improve the results of image processing methods as a complementary tool in automatic or semiautomatic systems. PMID:23919537
Schmier, Sonja; Mostafa, Ahmed; Haarmann, Thomas; Bannert, Norbert; Ziebuhr, John; Veljkovic, Veljko; Dietrich, Ursula; Pleschka, Stephan
2015-06-19
Newly emerging influenza A viruses (IAV) pose a major threat to human health by causing seasonal epidemics and/or pandemics, the latter often facilitated by the lack of pre-existing immunity in the general population. Early recognition of candidate pandemic influenza viruses (CPIV) is of crucial importance for restricting virus transmission and developing appropriate therapeutic and prophylactic strategies including effective vaccines. Often, the pandemic potential of newly emerging IAV is only fully recognized once the virus starts to spread efficiently causing serious disease in humans. Here, we used a novel phylogenetic algorithm based on the informational spectrum method (ISM) to identify potential CPIV by predicting mutations in the viral hemagglutinin (HA) gene that are likely to (differentially) affect critical interactions between the HA protein and target cells from bird and human origin, respectively. Predictions were subsequently validated by generating pseudotyped retrovirus particles and genetically engineered IAV containing these mutations and characterizing potential effects on virus entry and replication in cells expressing human and avian IAV receptors, respectively. Our data suggest that the ISM-based algorithm is suitable to identify CPIV among IAV strains that are circulating in animal hosts and thus may be a new tool for assessing pandemic risks associated with specific strains.
NASA Astrophysics Data System (ADS)
Schmier, Sonja; Mostafa, Ahmed; Haarmann, Thomas; Bannert, Norbert; Ziebuhr, John; Veljkovic, Veljko; Dietrich, Ursula; Pleschka, Stephan
2015-06-01
Newly emerging influenza A viruses (IAV) pose a major threat to human health by causing seasonal epidemics and/or pandemics, the latter often facilitated by the lack of pre-existing immunity in the general population. Early recognition of candidate pandemic influenza viruses (CPIV) is of crucial importance for restricting virus transmission and developing appropriate therapeutic and prophylactic strategies including effective vaccines. Often, the pandemic potential of newly emerging IAV is only fully recognized once the virus starts to spread efficiently causing serious disease in humans. Here, we used a novel phylogenetic algorithm based on the informational spectrum method (ISM) to identify potential CPIV by predicting mutations in the viral hemagglutinin (HA) gene that are likely to (differentially) affect critical interactions between the HA protein and target cells from bird and human origin, respectively. Predictions were subsequently validated by generating pseudotyped retrovirus particles and genetically engineered IAV containing these mutations and characterizing potential effects on virus entry and replication in cells expressing human and avian IAV receptors, respectively. Our data suggest that the ISM-based algorithm is suitable to identify CPIV among IAV strains that are circulating in animal hosts and thus may be a new tool for assessing pandemic risks associated with specific strains.
Data integration to prioritize drugs using genomics and curated data.
Louhimo, Riku; Laakso, Marko; Belitskin, Denis; Klefström, Juha; Lehtonen, Rainer; Hautaniemi, Sampsa
2016-01-01
Genomic alterations affecting drug target proteins occur in several tumor types and are prime candidates for patient-specific tailored treatments. Increasingly, patients likely to benefit from targeted cancer therapy are selected based on molecular alterations. The selection of a precision therapy benefiting most patients is challenging but can be enhanced with integration of multiple types of molecular data. Data integration approaches for drug prioritization have successfully integrated diverse molecular data but do not take full advantage of existing data and literature. We have built a knowledge-base which connects data from public databases with molecular results from over 2200 tumors, signaling pathways and drug-target databases. Moreover, we have developed a data mining algorithm to effectively utilize this heterogeneous knowledge-base. Our algorithm is designed to facilitate retargeting of existing drugs by stratifying samples and prioritizing drug targets. We analyzed 797 primary tumors from The Cancer Genome Atlas breast and ovarian cancer cohorts using our framework. FGFR, CDK and HER2 inhibitors were prioritized in breast and ovarian data sets. Estrogen receptor positive breast tumors showed potential sensitivity to targeted inhibitors of FGFR due to activation of FGFR3. Our results suggest that computational sample stratification selects potentially sensitive samples for targeted therapies and can aid in precision medicine drug repositioning. Source code is available from http://csblcanges.fimm.fi/GOPredict/.
Garbage Collection in a Distributed Object-Oriented System
NASA Technical Reports Server (NTRS)
Gupta, Aloke; Fuchs, W. Kent
1993-01-01
An algorithm is described in this paper for garbage collection in distributed systems with object sharing across processor boundaries. The algorithm allows local garbage collection at each node in the system to proceed independently of local collection at the other nodes. It requires no global synchronization or knowledge of the global state of the system and exhibits the capability of graceful degradation. The concept of a specialized dump node is proposed to facilitate the collection of inaccessible circular structures. An experimental evaluation of the algorithm is also described. The algorithm is compared with a corresponding scheme that requires global synchronization. The results show that the algorithm works well in distributed processing environments even when the locality of object references is low.
Li, Longxiang; Xue, Donglin; Deng, Weijie; Wang, Xu; Bai, Yang; Zhang, Feng; Zhang, Xuejun
2017-11-10
In deterministic computer-controlled optical surfacing, accurate dwell time execution by computer numeric control machines is crucial in guaranteeing a high-convergence ratio for the optical surface error. It is necessary to consider the machine dynamics limitations in the numerical dwell time algorithms. In this paper, these constraints on dwell time distribution are analyzed, and a model of the equal extra material removal is established. A positive dwell time algorithm with minimum equal extra material removal is developed. Results of simulations based on deterministic magnetorheological finishing demonstrate the necessity of considering machine dynamics performance and illustrate the validity of the proposed algorithm. Indeed, the algorithm effectively facilitates the determinacy of sub-aperture optical surfacing processes.
Goodswen, Stephen J; Kennedy, Paul J; Ellis, John T
2013-11-02
An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory.
A structure adapted multipole method for electrostatic interactions in protein dynamics
NASA Astrophysics Data System (ADS)
Niedermeier, Christoph; Tavan, Paul
1994-07-01
We present an algorithm for rapid approximate evaluation of electrostatic interactions in molecular dynamics simulations of proteins. Traditional algorithms require computational work of the order O(N2) for a system of N particles. Truncation methods which try to avoid that effort entail untolerably large errors in forces, energies and other observables. Hierarchical multipole expansion algorithms, which can account for the electrostatics to numerical accuracy, scale with O(N log N) or even with O(N) if they become augmented by a sophisticated scheme for summing up forces. To further reduce the computational effort we propose an algorithm that also uses a hierarchical multipole scheme but considers only the first two multipole moments (i.e., charges and dipoles). Our strategy is based on the consideration that numerical accuracy may not be necessary to reproduce protein dynamics with sufficient correctness. As opposed to previous methods, our scheme for hierarchical decomposition is adjusted to structural and dynamical features of the particular protein considered rather than chosen rigidly as a cubic grid. As compared to truncation methods we manage to reduce errors in the computation of electrostatic forces by a factor of 10 with only marginal additional effort.
2013-01-01
Background An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. Results The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Conclusions Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory. PMID:24180526
Recognition of Protein-coding Genes Based on Z-curve Algorithms
-Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling
2014-01-01
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027
ScaffoldSeq: Software for characterization of directed evolution populations.
Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J
2016-07-01
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Algorithmic transformation of multi-loop master integrals to a canonical basis with CANONICA
NASA Astrophysics Data System (ADS)
Meyer, Christoph
2018-01-01
The integration of differential equations of Feynman integrals can be greatly facilitated by using a canonical basis. This paper presents the Mathematica package CANONICA, which implements a recently developed algorithm to automatize the transformation to a canonical basis. This represents the first publicly available implementation suitable for differential equations depending on multiple scales. In addition to the presentation of the package, this paper extends the description of some aspects of the algorithm, including a proof of the uniqueness of canonical forms up to constant transformations.
Improvement of Speckle Contrast Image Processing by an Efficient Algorithm.
Steimers, A; Farnung, W; Kohl-Bareis, M
2016-01-01
We demonstrate an efficient algorithm for the temporal and spatial based calculation of speckle contrast for the imaging of blood flow by laser speckle contrast analysis (LASCA). It reduces the numerical complexity of necessary calculations, facilitates a multi-core and many-core implementation of the speckle analysis and enables an independence of temporal or spatial resolution and SNR. The new algorithm was evaluated for both spatial and temporal based analysis of speckle patterns with different image sizes and amounts of recruited pixels as sequential, multi-core and many-core code.
Ostreĭkov, I F; Podkopaev, V N; Moiseev, D B; Karpysheva, E V; Markova, L A; Sizov, S V
1997-01-01
Total mortality decreased by 2.5 times in the wards for intensive care of the newborns in the Tushino Pediatric Hospital in 1996 and is now 7.6%. Such results are due to a complex of measures, one such measure being the development and introduction of an algorithm for the diagnosis and treatment of newborns hospitalized in intensive care wards. The algorithm facilitates the work of the staff, helps earlier diagnose a disease, and, hence, carry out timely scientifically based therapy.
Measuring mitotic spindle dynamics in budding yeast
NASA Astrophysics Data System (ADS)
Plumb, Kemp
In order to carry out its life cycle and produce viable progeny through cell division, a cell must successfully coordinate and execute a number of complex processes with high fidelity, in an environment dominated by thermal noise. One important example of such a process is the assembly and positioning of the mitotic spindle prior to chromosome segregation. The mitotic spindle is a modular structure composed of two spindle pole bodies, separated in space and spanned by filamentous proteins called microtubules, along which the genetic material of the cell is held. The spindle is responsible for alignment and subsequent segregation of chromosomes into two equal parts; proper spindle positioning and timing ensure that genetic material is appropriately divided amongst mother and daughter cells. In this thesis, I describe fluorescence confocal microscopy and automated image analysis algorithms, which I have used to observe and analyze the real space dynamics of the mitotic spindle in budding yeast. The software can locate structures in three spatial dimensions and track their movement in time. By selecting fluorescent proteins which specifically label the spindle poles and cell periphery, mitotic spindle dynamics have been measured in a coordinate system relevant to the cell division. I describe how I have characterised the accuracy and precision of the algorithms by simulating fluorescence data for both spindle poles and the budding yeast cell surface. In this thesis I also describe the construction of a microfluidic apparatus that allows for the measurement of long time-scale dynamics of individual cells and the development of a cell population. The tools developed in this thesis work will facilitate in-depth quantitative analysis of the non-equilibrium processes in living cells.
Computational design of chimeric protein libraries for directed evolution.
Silberg, Jonathan J; Nguyen, Peter Q; Stevenson, Taylor
2010-01-01
The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly related proteins are recombined, the fraction of variants having a disrupted structure increases. One way to enrich the fraction of folded and potentially interesting chimeras in these libraries is to use computational algorithms to anticipate which structural elements can be swapped without disturbing the integrity of a protein's structure. Herein, we describe how the algorithm Schema uses the sequences and structures of the parent proteins recombined to predict the structural disruption of chimeras, and we outline how dynamic programming can be used to find libraries with a range of amino acid substitution levels that are enriched in variants with low Schema disruption.
Goonesekere, Nalin Cw
2009-01-01
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Kang, Beom Sik; Pugalendhi, GaneshKumar; Kim, Ku-Jin
2017-10-13
Interactions between protein molecules are essential for the assembly, function, and regulation of proteins. The contact region between two protein molecules in a protein complex is usually complementary in shape for both molecules and the area of the contact region can be used to estimate the binding strength between two molecules. Although the area is a value calculated from the three-dimensional surface, it cannot represent the three-dimensional shape of the surface. Therefore, we propose an original concept of two-dimensional contact area which provides further information such as the ruggedness of the contact region. We present a novel algorithm for calculating the binding direction between two molecules in a protein complex, and then suggest a method to compute the two-dimensional flattened area of the contact region between two molecules based on the binding direction.
A quasi-physical algorithm for the structure optimization in an off-lattice protein model.
Liu, Jing-Fa; Huang, Wen-Qi
2006-02-01
In this paper, we study an off-lattice protein AB model with two species of monomers, hydrophobic and hydrophilic, and present a heuristic quasi-physical algorithm. First, by elaborately simulating the movement of the smooth solids in the physical world, we find low-energy conformations for a given monomer chain. A subsequent off-trap strategy is then proposed to trigger a jump for a stuck situation in order to get out of the local minima. The algorithm has been tested in the three-dimensional AB model for all sequences with lengths of 13-55 monomers. In several cases, we renew the putative ground state energy values. The numerical results show that the proposed methods are very promising for finding the ground states of proteins.
Predicting cancer-relevant proteins using an improved molecular similarity ensemble approach.
Zhou, Bin; Sun, Qi; Kong, De-Xin
2016-05-31
In this study, we proposed an improved algorithm for identifying proteins relevant to cancer. The algorithm was named two-layer molecular similarity ensemble approach (TL-SEA). We applied TL-SEA to analyzing the correlation between anticancer compounds (against cell lines K562, MCF7 and A549) and active compounds against separate target proteins listed in BindingDB. Several associations between cancer types and related proteins were revealed using this chemoinformatics approach. An analysis of the literature showed that 26 of 35 predicted proteins were correlated with cancer cell proliferation, apoptosis or differentiation. Additionally, interactions between proteins in BindingDB and anticancer chemicals were also predicted. We discuss the roles of the most important predicted proteins in cancer biology and conclude that TL-SEA could be a useful tool for inferring novel proteins involved in cancer and revealing underlying molecular mechanisms.
Protein-ligand docking using fitness learning-based artificial bee colony with proximity stimuli.
Uehara, Shota; Fujimoto, Kazuhiro J; Tanaka, Shigenori
2015-07-07
Protein-ligand docking is an optimization problem, which aims to identify the binding pose of a ligand with the lowest energy in the active site of a target protein. In this study, we employed a novel optimization algorithm called fitness learning-based artificial bee colony with proximity stimuli (FlABCps) for docking. Simulation results revealed that FlABCps improved the success rate of docking, compared to four state-of-the-art algorithms. The present results also showed superior docking performance of FlABCps, in particular for dealing with highly flexible ligands and proteins with a wide and shallow binding pocket.
Site-directed protein recombination as a shortest-path problem.
Endelman, Jeffrey B; Silberg, Jonathan J; Wang, Zhen-Gang; Arnold, Frances H
2004-07-01
Protein function can be tuned using laboratory evolution, in which one rapidly searches through a library of proteins for the properties of interest. In site-directed recombination, n crossovers are chosen in an alignment of p parents to define a set of p(n + 1) peptide fragments. These fragments are then assembled combinatorially to create a library of p(n+1) proteins. We have developed a computational algorithm to enrich these libraries in folded proteins while maintaining an appropriate level of diversity for evolution. For a given set of parents, our algorithm selects crossovers that minimize the average energy of the library, subject to constraints on the length of each fragment. This problem is equivalent to finding the shortest path between nodes in a network, for which the global minimum can be found efficiently. Our algorithm has a running time of O(N(3)p(2) + N(2)n) for a protein of length N. Adjusting the constraints on fragment length generates a set of optimized libraries with varying degrees of diversity. By comparing these optima for different sets of parents, we rapidly determine which parents yield the lowest energy libraries.
DASS: efficient discovery and p-value calculation of substructures in unordered data.
Hollunder, Jens; Friedel, Maik; Beyer, Andreas; Workman, Christopher T; Wilhelm, Thomas
2007-01-01
Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS
Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung
2011-08-01
Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.
Yan, Yumeng; Tao, Huanyu; Huang, Sheng-You
2018-05-26
A major subclass of protein-protein interactions is formed by homo-oligomers with certain symmetry. Therefore, computational modeling of the symmetric protein complexes is important for understanding the molecular mechanism of related biological processes. Although several symmetric docking algorithms have been developed for Cn symmetry, few docking servers have been proposed for Dn symmetry. Here, we present HSYMDOCK, a web server of our hierarchical symmetric docking algorithm that supports both Cn and Dn symmetry. The HSYMDOCK server was extensively evaluated on three benchmarks of symmetric protein complexes, including the 20 CASP11-CAPRI30 homo-oligomer targets, the symmetric docking benchmark of 213 Cn targets and 35 Dn targets, and a nonredundant test set of 55 transmembrane proteins. It was shown that HSYMDOCK obtained a significantly better performance than other similar docking algorithms. The server supports both sequence and structure inputs for the monomer/subunit. Users have an option to provide the symmetry type of the complex, or the server can predict the symmetry type automatically. The docking process is fast and on average consumes 10∼20 min for a docking job. The HSYMDOCK web server is available at http://huanglab.phys.hust.edu.cn/hsymdock/.
Liu, Guan-Ting; Kung, Hsiu-Ni; Chen, Chung-Kuan; Huang, Cheng; Wang, Yung-Li; Yu, Cheng-Pu; Lee, Chung-Pei
2018-02-26
Although a vesicular nucleocytoplasmic transport system is believed to exist in eukaryotic cells, the features of this pathway are mostly unknown. Here, we report that the BFRF1 protein of the Epstein-Barr virus improves vesicular transport of nuclear envelope (NE) to facilitate the translocation and clearance of nuclear components. BFRF1 expression induces vesicles that selectively transport nuclear components to the cytoplasm. With the use of aggregation-prone proteins as tools, we found that aggregated nuclear proteins are dispersed when these BFRF1-induced vesicles are formed. BFRF1-containing vesicles engulf the NE-associated aggregates, exit through from the NE, and putatively fuse with autophagic vacuoles. Chemical treatment and genetic ablation of autophagy-related factors indicate that autophagosome formation and autophagy-linked FYVE protein-mediated autophagic proteolysis are involved in this selective clearance of nuclear proteins. Remarkably, vesicular transport, elicited by BFRF1, also attenuated nuclear aggregates accumulated in neuroblastoma cells. Accordingly, induction of NE-derived vesicles by BFRF1 facilitates nuclear protein translocation and clearance, suggesting that autophagy-coupled transport of nucleus-derived vesicles can be elicited for nuclear component catabolism in mammalian cells.-Liu, G.-T., Kung, H.-N., Chen, C.-K., Huang, C., Wang, Y.-L., Yu, C.-P., Lee, C.-P. Improving nuclear envelope dynamics by EBV BFRF1 facilitates intranuclear component clearance through autophagy.
Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar
2016-01-01
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design. PMID:27958331
Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar
2016-12-13
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
More reliable protein NMR peak assignment via improved 2-interval scheduling.
Chen, Zhi-Zhong; Lin, Guohui; Rizzi, Romeo; Wen, Jianjun; Xu, Dong; Xu, Ying; Jiang, Tao
2005-03-01
Protein NMR peak assignment refers to the process of assigning a group of "spin systems" obtained experimentally to a protein sequence of amino acids. The automation of this process is still an unsolved and challenging problem in NMR protein structure determination. Recently, protein NMR peak assignment has been formulated as an interval scheduling problem (ISP), where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids from P is viewed as a "job" j(s), the preference of assigning S to a subsequence P of consecutive amino acids on P is viewed as the profit of executing job j(s) in the subinterval of I corresponding to P, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. The interval scheduling problem is max SNP-hard in general; but in the real practice of protein NMR peak assignment, each job j(s) usually requires at most 10 consecutive time units, and typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. In order to solve these most difficult assignments, we present an efficient 13/7-approximation algorithm for the special case of the interval scheduling problem where each job takes one or two consecutive time units. Combining this algorithm with a greedy filtering strategy for handling long jobs (i.e., jobs that need more than two consecutive time units), we obtain a new efficient heuristic for protein NMR peak assignment. Our experimental study shows that the new heuristic produces the best peak assignment in most of the cases, compared with the NMR peak assignment algorithms in the recent literature. The above algorithm is also the first approximation algorithm for a nontrivial case of the well-known interval scheduling problem that breaks the ratio 2 barrier.
Duan, Zhiqiang; Chen, Jian; Xu, Haixu; Zhu, Jie; Li, Qunhui; He, Liang; Liu, Huimou; Hu, Shunlin; Liu, Xiufan
2014-03-01
The cellular nucleolar proteins are reported to facilitate the replication cycles of some human and animal viruses by interaction with viral proteins. In this study, a nucleolar phosphoprotein B23 was identified to interact with Newcastle disease virus (NDV) matrix (M) protein. We found that NDV M protein accumulated in the nucleolus by binding B23 early in infection, but resulted in the redistribution of B23 from the nucleoli to the nucleoplasm later in infection. In vitro binding studies utilizing deletion mutants indicated that amino acids 30-60 of M and amino acids 188-245 of B23 were required for binding. Furthermore, knockdown of B23 by siRNA or overexpression of B23 or M-binding B23-derived polypeptides remarkably reduced cytopathic effect and inhibited NDV replication. Collectively, we show that B23 facilitates NDV replication by targeting M to the nucleolus, demonstrating for the first time a direct role for nucleolar protein B23 in a paramyxovirus replication process. Copyright © 2014 Elsevier Inc. All rights reserved.
A fast parallel clustering algorithm for molecular simulation trajectories.
Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui
2013-01-15
We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens
2015-01-01
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100
Li, Min; Li, Qi; Ganegoda, Gamage Upeksha; Wang, JianXin; Wu, FangXiang; Pan, Yi
2014-11-01
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Systematic identification and analysis of frequent gene fusion events in metabolic pathways
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henry, Christopher S.; Lerma-Ortiz, Claudia; Gerdes, Svetlana Y.
Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These setsmore » were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.« less
Systematic identification and analysis of frequent gene fusion events in metabolic pathways
Henry, Christopher S.; Lerma-Ortiz, Claudia; Gerdes, Svetlana Y.; ...
2016-06-24
Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These setsmore » were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.« less
NASA Astrophysics Data System (ADS)
Hu, Yufeng; Chen, Zhenhang; Fu, Yanjun; He, Qingzhong; Jiang, Lun; Zheng, Jiangge; Gao, Yina; Mei, Pinchao; Chen, Zhongzhou; Ren, Xueqin
2015-03-01
Flexibility is an intrinsic property of proteins and essential for their biological functions. However, because of structural flexibility, obtaining high-quality crystals of proteins with heterogeneous conformations remain challenging. Here, we show a novel approach to immobilize traditional precipitants onto molecularly imprinted polymers (MIPs) to facilitate protein crystallization, especially for flexible proteins. By applying this method, high-quality crystals of the flexible N-terminus of human fragile X mental retardation protein are obtained, whose absence causes the most common inherited mental retardation. A novel KH domain and an intermolecular disulfide bond are discovered, and several types of dimers are found in solution, thus providing insights into the function of this protein. Furthermore, the precipitant-immobilized MIPs (piMIPs) successfully facilitate flexible protein crystal formation for five model proteins with increased diffraction resolution. This highlights the potential of piMIPs for the crystallization of flexible proteins.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.
Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai
2015-12-01
The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Conservation of hot regions in protein-protein interaction in evolution.
Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong
2016-11-01
The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.
Ensemble of hybrid genetic algorithm for two-dimensional phase unwrapping
NASA Astrophysics Data System (ADS)
Balakrishnan, D.; Quan, C.; Tay, C. J.
2013-06-01
The phase unwrapping is the final and trickiest step in any phase retrieval technique. Phase unwrapping by artificial intelligence methods (optimization algorithms) such as hybrid genetic algorithm, reverse simulated annealing, particle swarm optimization, minimum cost matching showed better results than conventional phase unwrapping methods. In this paper, Ensemble of hybrid genetic algorithm with parallel populations is proposed to solve the branch-cut phase unwrapping problem. In a single populated hybrid genetic algorithm, the selection, cross-over and mutation operators are applied to obtain new population in every generation. The parameters and choice of operators will affect the performance of the hybrid genetic algorithm. The ensemble of hybrid genetic algorithm will facilitate to have different parameters set and different choice of operators simultaneously. Each population will use different set of parameters and the offspring of each population will compete against the offspring of all other populations, which use different set of parameters. The effectiveness of proposed algorithm is demonstrated by phase unwrapping examples and advantages of the proposed method are discussed.
An automated method for finding molecular complexes in large protein interaction networks
Bader, Gary D; Hogue, Christopher WV
2003-01-01
Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. Results This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Conclusion Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from . PMID:12525261
Joshi, Anuja; Gislason-Lee, Amber J; Keeble, Claire; Sivananthan, Uduvil M
2017-01-01
Objective: The aim of this research was to quantify the reduction in radiation dose facilitated by image processing alone for percutaneous coronary intervention (PCI) patient angiograms, without reducing the perceived image quality required to confidently make a diagnosis. Methods: Incremental amounts of image noise were added to five PCI angiograms, simulating the angiogram as having been acquired at corresponding lower dose levels (10–89% dose reduction). 16 observers with relevant experience scored the image quality of these angiograms in 3 states—with no image processing and with 2 different modern image processing algorithms applied. These algorithms are used on state-of-the-art and previous generation cardiac interventional X-ray systems. Ordinal regression allowing for random effects and the delta method were used to quantify the dose reduction possible by the processing algorithms, for equivalent image quality scores. Results: Observers rated the quality of the images processed with the state-of-the-art and previous generation image processing with a 24.9% and 15.6% dose reduction, respectively, as equivalent in quality to the unenhanced images. The dose reduction facilitated by the state-of-the-art image processing relative to previous generation processing was 10.3%. Conclusion: Results demonstrate that statistically significant dose reduction can be facilitated with no loss in perceived image quality using modern image enhancement; the most recent processing algorithm was more effective in preserving image quality at lower doses. Advances in knowledge: Image enhancement was shown to maintain perceived image quality in coronary angiography at a reduced level of radiation dose using computer software to produce synthetic images from real angiograms simulating a reduction in dose. PMID:28124572
Zhang, Huaguang; Song, Ruizhuo; Wei, Qinglai; Zhang, Tieyan
2011-12-01
In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the "backward iteration" is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.
ERIC Educational Resources Information Center
Jin, Iksung; Kandel, Eric R.; Hawkins, Robert D.
2011-01-01
Whereas short-term plasticity involves covalent modifications that are generally restricted to either presynaptic or postsynaptic structures, long-term plasticity involves the growth of new synapses, which by its nature involves both pre- and postsynaptic alterations. In addition, an intermediate-term stage of plasticity has been identified that…
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Subjective comparison and evaluation of speech enhancement algorithms
Hu, Yi; Loizou, Philipos C.
2007-01-01
Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests. PMID:18046463
Collegial Activity Learning between Heterogeneous Sensors.
Feuz, Kyle D; Cook, Diane J
2017-11-01
Activity recognition algorithms have matured and become more ubiquitous in recent years. However, these algorithms are typically customized for a particular sensor platform. In this paper we introduce PECO, a Personalized activity ECOsystem, that transfers learned activity information seamlessly between sensor platforms in real time so that any available sensor can continue to track activities without requiring its own extensive labeled training data. We introduce a multi-view transfer learning algorithm that facilitates this information handoff between sensor platforms and provide theoretical performance bounds for the algorithm. In addition, we empirically evaluate PECO using datasets that utilize heterogeneous sensor platforms to perform activity recognition. These results indicate that not only can activity recognition algorithms transfer important information to new sensor platforms, but any number of platforms can work together as colleagues to boost performance.
Hus, Vanessa; Lord, Catherine
2014-08-01
The recently published Autism Diagnostic Observation Schedule, 2nd edition (ADOS-2) includes revised diagnostic algorithms and standardized severity scores for modules used to assess younger children. A revised algorithm and severity scores are not yet available for Module 4, used with verbally fluent adults. The current study revises the Module 4 algorithm and calibrates raw overall and domain totals to provide metrics of autism spectrum disorder (ASD) symptom severity. Sensitivity and specificity of the revised Module 4 algorithm exceeded 80 % in the overall sample. Module 4 calibrated severity scores provide quantitative estimates of ASD symptom severity that are relatively independent of participant characteristics. These efforts increase comparability of ADOS scores across modules and should facilitate efforts to examine symptom trajectories from toddler to adulthood.
Richardson, Keith; Denny, Richard; Hughes, Chris; Skilling, John; Sikora, Jacek; Dadlez, Michał; Manteca, Angel; Jung, Hye Ryung; Jensen, Ole Nørregaard; Redeker, Virginie; Melki, Ronald; Langridge, James I.; Vissers, Johannes P.C.
2013-01-01
A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods. PMID:22871168
Predicting protein functions from redundancies in large-scale protein interaction networks
NASA Technical Reports Server (NTRS)
Samanta, Manoj Pratim; Liang, Shoudan
2003-01-01
Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (approximately 89%) of the original associations.
Cyclic coordinate descent: A robotics algorithm for protein loop closure.
Canutescu, Adrian A; Dunbrack, Roland L
2003-05-01
In protein structure prediction, it is often the case that a protein segment must be adjusted to connect two fixed segments. This occurs during loop structure prediction in homology modeling as well as in ab initio structure prediction. Several algorithms for this purpose are based on the inverse Jacobian of the distance constraints with respect to dihedral angle degrees of freedom. These algorithms are sometimes unstable and fail to converge. We present an algorithm developed originally for inverse kinematics applications in robotics. In robotics, an end effector in the form of a robot hand must reach for an object in space by altering adjustable joint angles and arm lengths. In loop prediction, dihedral angles must be adjusted to move the C-terminal residue of a segment to superimpose on a fixed anchor residue in the protein structure. The algorithm, referred to as cyclic coordinate descent or CCD, involves adjusting one dihedral angle at a time to minimize the sum of the squared distances between three backbone atoms of the moving C-terminal anchor and the corresponding atoms in the fixed C-terminal anchor. The result is an equation in one variable for the proposed change in each dihedral. The algorithm proceeds iteratively through all of the adjustable dihedral angles from the N-terminal to the C-terminal end of the loop. CCD is suitable as a component of loop prediction methods that generate large numbers of trial structures. It succeeds in closing loops in a large test set 99.79% of the time, and fails occasionally only for short, highly extended loops. It is very fast, closing loops of length 8 in 0.037 sec on average.
Predicting Nonspecific Ion Binding Using DelPhi
Petukh, Marharyta; Zhenirovskyy, Maxim; Li, Chuan; Li, Lin; Wang, Lin; Alexov, Emil
2012-01-01
Ions are an important component of the cell and affect the corresponding biological macromolecules either via direct binding or as a screening ion cloud. Although some ion binding is highly specific and frequently associated with the function of the macromolecule, other ions bind to the protein surface nonspecifically, presumably because the electrostatic attraction is strong enough to immobilize them. Here, we test such a scenario and demonstrate that experimentally identified surface-bound ions are located at a potential that facilitates binding, which indicates that the major driving force is the electrostatics. Without taking into consideration geometrical factors and structural fluctuations, we show that ions tend to be bound onto the protein surface at positions with strong potential but with polarity opposite to that of the ion. This observation is used to develop a method that uses a DelPhi-calculated potential map in conjunction with an in-house-developed clustering algorithm to predict nonspecific ion-binding sites. Although this approach distinguishes only the polarity of the ions, and not their chemical nature, it can predict nonspecific binding of positively or negatively charged ions with acceptable accuracy. One can use the predictions in the Poisson-Boltzmann approach by placing explicit ions in the predicted positions, which in turn will reduce the magnitude of the local potential and extend the limits of the Poisson-Boltzmann equation. In addition, one can use this approach to place the desired number of ions before conducting molecular-dynamics simulations to neutralize the net charge of the protein, because it was shown to perform better than standard screened Coulomb canned routines, or to predict ion-binding sites in proteins. This latter is especially true for proteins that are involved in ion transport, because such ions are loosely bound and very difficult to detect experimentally. PMID:22735539
Rational Design of Orthogonal Multipolar Interactions with Fluorine in Protein–Ligand Complexes
Pollock, Jonathan; Borkin, Dmitry; Lund, George; ...
2015-08-19
Multipolar interactions involving fluorine and the protein backbone have been frequently observed in protein–ligand complexes. Such fluorine–backbone interactions may substantially contribute to the high affinity of small molecule inhibitors. Here we found that introduction of trifluoromethyl groups into two different sites in the thienopyrimidine class of menin–MLL inhibitors considerably improved their inhibitory activity. In both cases, trifluoromethyl groups are engaged in short interactions with the backbone of menin. In order to understand the effect of fluorine, we synthesized a series of analogues by systematically changing the number of fluorine atoms, and we determined high-resolution crystal structures of the complexes withmore » menin. Here, we found that introduction of fluorine at favorable geometry for interactions with backbone carbonyls may improve the activity of menin–MLL inhibitors as much as 5- to 10-fold. In order to facilitate the design of multipolar fluorine–backbone interactions in protein–ligand complexes, we developed a computational algorithm named FMAP, which calculates fluorophilic sites in proximity to the protein backbone. We demonstrated that FMAP could be used to rationalize improvement in the activity of known protein inhibitors upon introduction of fluorine. Furthermore, FMAP may also represent a valuable tool for designing new fluorine substitutions and support ligand optimization in drug discovery projects. Analysis of the menin–MLL inhibitor complexes revealed that the backbone in secondary structures is particularly accessible to the interactions with fluorine. Lastly, considering that secondary structure elements are frequently exposed at protein interfaces, we postulate that multipolar fluorine–backbone interactions may represent a particularly attractive approach to improve inhibitors of protein–protein interactions.« less
Detection of confinement and jumps in single-molecule membrane trajectories
NASA Astrophysics Data System (ADS)
Meilhac, N.; Le Guyader, L.; Salomé, L.; Destainville, N.
2006-01-01
We propose a variant of the algorithm by [R. Simson, E. D. Sheets, and K. Jacobson, Biophys. 69, 989 (1995)]. Their algorithm was developed to detect transient confinement zones in experimental single-particle tracking trajectories of diffusing membrane proteins or lipids. We show that our algorithm is able to detect confinement in a wider class of confining potential shapes than that of Simson Furthermore, it enables to detect not only temporary confinement but also jumps between confinement zones. Jumps are predicted by membrane skeleton fence and picket models. In the case of experimental trajectories of μ -opioid receptors, which belong to the family of G-protein-coupled receptors involved in a signal transduction pathway, this algorithm confirms that confinement cannot be explained solely by rigid fences.
A new algorithm for reliable and general NMR resonance assignment.
Schmidt, Elena; Güntert, Peter
2012-08-01
The new FLYA automated resonance assignment algorithm determines NMR chemical shift assignments on the basis of peak lists from any combination of multidimensional through-bond or through-space NMR experiments for proteins. Backbone and side-chain assignments can be determined. All experimental data are used simultaneously, thereby exploiting optimally the redundancy present in the input peak lists and circumventing potential pitfalls of assignment strategies in which results obtained in a given step remain fixed input data for subsequent steps. Instead of prescribing a specific assignment strategy, the FLYA resonance assignment algorithm requires only experimental peak lists and the primary structure of the protein, from which the peaks expected in a given spectrum can be generated by applying a set of rules, defined in a straightforward way by specifying through-bond or through-space magnetization transfer pathways. The algorithm determines the resonance assignment by finding an optimal mapping between the set of expected peaks that are assigned by definition but have unknown positions and the set of measured peaks in the input peak lists that are initially unassigned but have a known position in the spectrum. Using peak lists obtained by purely automated peak picking from the experimental spectra of three proteins, FLYA assigned correctly 96-99% of the backbone and 90-91% of all resonances that could be assigned manually. Systematic studies quantified the impact of various factors on the assignment accuracy, namely the extent of missing real peaks and the amount of additional artifact peaks in the input peak lists, as well as the accuracy of the peak positions. Comparing the resonance assignments from FLYA with those obtained from two other existing algorithms showed that using identical experimental input data these other algorithms yielded significantly (40-142%) more erroneous assignments than FLYA. The FLYA resonance assignment algorithm thus has the reliability and flexibility to replace most manual and semi-automatic assignment procedures for NMR studies of proteins.
Predicting Protein Structure Using Parallel Genetic Algorithms.
1994-12-01
Molecular dynamics attempts to simulate the protein folding process. However, the time steps required for this simulation are on the order of one...harmonics. These two factors have limited molecular dynamics simulations to less than a few nanoseconds (10-9 sec), even on today’s fastest supercomputers...By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-02-26
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified.
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-01-01
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified. PMID:26927169
Liu, Nai-Yu; Lee, Hsiao-Hui; Chang, Zee-Fen; Tsay, Yeou-Guang
2015-09-10
It has been observed that a modified peptide and its non-modified counterpart, when analyzed with reverse phase liquid chromatography, usually share a very similar elution property [1-3]. Inasmuch as this property is common to many different types of protein modifications, we propose an informatics-based approach, featuring the generation of segmental average mass spectra ((sa)MS), that is capable of locating different types of modified peptides in two-dimensional liquid chromatography-mass spectrometric (LC-MS) data collected for regular protease digests from proteins in gels or solutions. To enable the localization of these peptides in the LC-MS map, we have implemented a set of computer programs, or the (sa)MS package, that perform the needed functions, including generating a complete set of segmental average mass spectra, compiling the peptide inventory from the Sequest/TurboSequest results, searching modified peptide candidates and annotating a tandem mass spectrum for final verification. Using ROCK2 as an example, our programs were applied to identify multiple types of modified peptides, such as phosphorylated and hexosylated ones, which particularly include those peptides that could have been ignored due to their peculiar fragmentation patterns and consequent low search scores. Hence, we demonstrate that, when complemented with peptide search algorithms, our approach and the entailed computer programs can add the sequence information needed for bolstering the confidence of data interpretation by the present analytical platforms and facilitate the mining of protein modification information out of complicated LC-MS/MS data. Copyright © 2015 Elsevier B.V. All rights reserved.
CAVER 3.0: A Tool for the Analysis of Transport Pathways in Dynamic Protein Structures
Strnad, Ondrej; Brezovsky, Jan; Kozlikova, Barbora; Gora, Artur; Sustr, Vilem; Klvana, Martin; Medek, Petr; Biedermannova, Lada; Sochor, Jiri; Damborsky, Jiri
2012-01-01
Tunnels and channels facilitate the transport of small molecules, ions and water solvent in a large variety of proteins. Characteristics of individual transport pathways, including their geometry, physico-chemical properties and dynamics are instrumental for understanding of structure-function relationships of these proteins, for the design of new inhibitors and construction of improved biocatalysts. CAVER is a software tool widely used for the identification and characterization of transport pathways in static macromolecular structures. Herein we present a new version of CAVER enabling automatic analysis of tunnels and channels in large ensembles of protein conformations. CAVER 3.0 implements new algorithms for the calculation and clustering of pathways. A trajectory from a molecular dynamics simulation serves as the typical input, while detailed characteristics and summary statistics of the time evolution of individual pathways are provided in the outputs. To illustrate the capabilities of CAVER 3.0, the tool was applied for the analysis of molecular dynamics simulation of the microbial enzyme haloalkane dehalogenase DhaA. CAVER 3.0 safely identified and reliably estimated the importance of all previously published DhaA tunnels, including the tunnels closed in DhaA crystal structures. Obtained results clearly demonstrate that analysis of molecular dynamics simulation is essential for the estimation of pathway characteristics and elucidation of the structural basis of the tunnel gating. CAVER 3.0 paves the way for the study of important biochemical phenomena in the area of molecular transport, molecular recognition and enzymatic catalysis. The software is freely available as a multiplatform command-line application at http://www.caver.cz. PMID:23093919
CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures.
Chovancova, Eva; Pavelka, Antonin; Benes, Petr; Strnad, Ondrej; Brezovsky, Jan; Kozlikova, Barbora; Gora, Artur; Sustr, Vilem; Klvana, Martin; Medek, Petr; Biedermannova, Lada; Sochor, Jiri; Damborsky, Jiri
2012-01-01
Tunnels and channels facilitate the transport of small molecules, ions and water solvent in a large variety of proteins. Characteristics of individual transport pathways, including their geometry, physico-chemical properties and dynamics are instrumental for understanding of structure-function relationships of these proteins, for the design of new inhibitors and construction of improved biocatalysts. CAVER is a software tool widely used for the identification and characterization of transport pathways in static macromolecular structures. Herein we present a new version of CAVER enabling automatic analysis of tunnels and channels in large ensembles of protein conformations. CAVER 3.0 implements new algorithms for the calculation and clustering of pathways. A trajectory from a molecular dynamics simulation serves as the typical input, while detailed characteristics and summary statistics of the time evolution of individual pathways are provided in the outputs. To illustrate the capabilities of CAVER 3.0, the tool was applied for the analysis of molecular dynamics simulation of the microbial enzyme haloalkane dehalogenase DhaA. CAVER 3.0 safely identified and reliably estimated the importance of all previously published DhaA tunnels, including the tunnels closed in DhaA crystal structures. Obtained results clearly demonstrate that analysis of molecular dynamics simulation is essential for the estimation of pathway characteristics and elucidation of the structural basis of the tunnel gating. CAVER 3.0 paves the way for the study of important biochemical phenomena in the area of molecular transport, molecular recognition and enzymatic catalysis. The software is freely available as a multiplatform command-line application at http://www.caver.cz.
Kuhl, D; Kennedy, T E; Barzilai, A; Kandel, E R
1992-12-01
Long-term memory for sensitization of the gill- and siphon-withdrawal reflexes in Aplysia californica requires RNA and protein synthesis. These long-term behavioral changes are accompanied by long-term facilitation of the synaptic connections between the gill and siphon sensory and motor neurons, which are similarly dependent on transcription and translation. In addition to showing an increase in over-all protein synthesis, long-term facilitation is associated with changes in the expression of specific early, intermediate, and late proteins, and with the growth of new synaptic connections between the sensory and motor neurons of the reflex. We previously focused on early proteins and have identified four proteins as members of the immunoglobulin family of cell adhesion molecules related to NCAM and fasciclin II. We have now cloned the cDNA corresponding to one of the late proteins, and identified it as the Aplysia homolog of BiP, an ER resident protein involved in the folding and assembly of secretory and membrane proteins. Behavioral training increases the steady-state level of BiP mRNA in the sensory neurons. The increase in the synthesis of BiP protein is first detected 3 h after the onset of facilitation, when the increase in overall protein synthesis reaches its peak and the formation of new synaptic terminals becomes apparent. These findings suggest that the chaperon function of BiP might serve to fold proteins and assemble protein complexes necessary for the structural changes characteristic of long-term memory.
A gradient based algorithm to solve inverse plane bimodular problems of identification
NASA Astrophysics Data System (ADS)
Ran, Chunjiang; Yang, Haitian; Zhang, Guoqing
2018-02-01
This paper presents a gradient based algorithm to solve inverse plane bimodular problems of identifying constitutive parameters, including tensile/compressive moduli and tensile/compressive Poisson's ratios. For the forward bimodular problem, a FE tangent stiffness matrix is derived facilitating the implementation of gradient based algorithms, for the inverse bimodular problem of identification, a two-level sensitivity analysis based strategy is proposed. Numerical verification in term of accuracy and efficiency is provided, and the impacts of initial guess, number of measurement points, regional inhomogeneity, and noisy data on the identification are taken into accounts.
Geller, Ron; Pechmann, Sebastian; Acevedo, Ashley; Andino, Raul; Frydman, Judith
2018-05-03
Acquisition of mutations is central to evolution; however, the detrimental effects of most mutations on protein folding and stability limit protein evolvability. Molecular chaperones, which suppress aggregation and facilitate polypeptide folding, may alleviate the effects of destabilizing mutations thus promoting sequence diversification. To illuminate how chaperones can influence protein evolution, we examined the effect of reduced activity of the chaperone Hsp90 on poliovirus evolution. We find that Hsp90 offsets evolutionary trade-offs between protein stability and aggregation. Lower chaperone levels favor variants of reduced hydrophobicity and protein aggregation propensity but at a cost to protein stability. Notably, reducing Hsp90 activity also promotes clusters of codon-deoptimized synonymous mutations at inter-domain boundaries, likely to facilitate cotranslational domain folding. Our results reveal how a chaperone can shape the sequence landscape at both the protein and RNA levels to harmonize competing constraints posed by protein stability, aggregation propensity, and translation rate on successful protein biogenesis.
NASA Astrophysics Data System (ADS)
Chalmers, Alex
2007-10-01
A simple model is presented of a possible inspection regimen applied to each leg of a cargo containers' journey between its point of origin and destination. Several candidate modalities are proposed to be used at multiple remote locations to act as a pre-screen inspection as the target approaches a perimeter and as the primary inspection modality at the portal. Information from multiple data sets are fused to optimize the costs and performance of a network of such inspection systems. A series of image processing algorithms are presented that automatically process X-ray images of containerized cargo. The goal of this processing is to locate the container in a real time stream of traffic traversing a portal without impeding the flow of commerce. Such processing may facilitate the inclusion of unmanned/unattended inspection systems in such a network. Several samples of the processing applied to data collected from deployed systems are included. Simulated data from a notional cargo inspection system with multiple sensor modalities and advanced data fusion algorithms are also included to show the potential increased detection and throughput performance of such a configuration.
Neural network fusion capabilities for efficient implementation of tracking algorithms
NASA Astrophysics Data System (ADS)
Sundareshan, Malur K.; Amoozegar, Farid
1997-03-01
The ability to efficiently fuse information of different forms to facilitate intelligent decision making is one of the major capabilities of trained multilayer neural networks that is now being recognized. While development of innovative adaptive control algorithms for nonlinear dynamical plants that attempt to exploit these capabilities seems to be more popular, a corresponding development of nonlinear estimation algorithms using these approaches, particularly for application in target surveillance and guidance operations, has not received similar attention. We describe the capabilities and functionality of neural network algorithms for data fusion and implementation of tracking filters. To discuss details and to serve as a vehicle for quantitative performance evaluations, the illustrative case of estimating the position and velocity of surveillance targets is considered. Efficient target- tracking algorithms that can utilize data from a host of sensing modalities and are capable of reliably tracking even uncooperative targets executing fast and complex maneuvers are of interest in a number of applications. The primary motivation for employing neural networks in these applications comes from the efficiency with which more features extracted from different sensor measurements can be utilized as inputs for estimating target maneuvers. A system architecture that efficiently integrates the fusion capabilities of a trained multilayer neural net with the tracking performance of a Kalman filter is described. The innovation lies in the way the fusion of multisensor data is accomplished to facilitate improved estimation without increasing the computational complexity of the dynamical state estimator itself.
Yang, Deshan; Brame, Scott; El Naqa, Issam; Aditya, Apte; Wu, Yu; Goddu, S Murty; Mutic, Sasa; Deasy, Joseph O; Low, Daniel A
2011-01-01
Recent years have witnessed tremendous progress in image guide radiotherapy technology and a growing interest in the possibilities for adapting treatment planning and delivery over the course of treatment. One obstacle faced by the research community has been the lack of a comprehensive open-source software toolkit dedicated for adaptive radiotherapy (ART). To address this need, the authors have developed a software suite called the Deformable Image Registration and Adaptive Radiotherapy Toolkit (DIRART). DIRART is an open-source toolkit developed in MATLAB. It is designed in an object-oriented style with focus on user-friendliness, features, and flexibility. It contains four classes of DIR algorithms, including the newer inverse consistency algorithms to provide consistent displacement vector field in both directions. It also contains common ART functions, an integrated graphical user interface, a variety of visualization and image-processing features, dose metric analysis functions, and interface routines. These interface routines make DIRART a powerful complement to the Computational Environment for Radiotherapy Research (CERR) and popular image-processing toolkits such as ITK. DIRART provides a set of image processing/registration algorithms and postprocessing functions to facilitate the development and testing of DIR algorithms. It also offers a good amount of options for DIR results visualization, evaluation, and validation. By exchanging data with treatment planning systems via DICOM-RT files and CERR, and by bringing image registration algorithms closer to radiotherapy applications, DIRART is potentially a convenient and flexible platform that may facilitate ART and DIR research. 0 2011 Ameri-
Prediction of protein-protein interaction network using a multi-objective optimization approach.
Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit
2016-06-01
Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta.
Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J
2010-03-01
PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta
Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J.
2010-01-01
Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site. Contact: pyrosetta@graylab.jhu.edu PMID:20061306
MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets.
Zhu, Hongbo; Pisabarro, M Teresa
2011-02-01
Identification of ligand binding pockets on proteins is crucial for the characterization of protein functions. It provides valuable information for protein-ligand docking and rational engineering of small molecules that regulate protein functions. A major number of current prediction algorithms of ligand binding pockets are based on cubic grid representation of proteins and, thus, the results are often protein orientation dependent. We present the MSPocket program for detecting pockets on the solvent excluded surface of proteins. The core algorithm of the MSPocket approach does not use any cubic grid system to represent proteins and is therefore independent of protein orientations. We demonstrate that MSPocket is able to achieve an accuracy of 75% in predicting ligand binding pockets on a test dataset used for evaluating several existing methods. The accuracy is 92% if the top three predictions are considered. Comparison to one of the recently published best performing methods shows that MSPocket reaches similar performance with the additional feature of being protein orientation independent. Interestingly, some of the predictions are different, meaning that the two methods can be considered complementary and combined to achieve better prediction accuracy. MSPocket also provides a graphical user interface for interactive investigation of the predicted ligand binding pockets. In addition, we show that overlap criterion is a better strategy for the evaluation of predicted ligand binding pockets than the single point distance criterion. The MSPocket source code can be downloaded from http://appserver.biotec.tu-dresden.de/MSPocket/. MSPocket is also available as a PyMOL plugin with a graphical user interface.
Facilitated Diffusion of Transcription Factor Proteins with Anomalous Bulk Diffusion.
Liu, Lin; Cherstvy, Andrey G; Metzler, Ralf
2017-02-16
What are the physical laws of the diffusive search of proteins for their specific binding sites on DNA in the presence of the macromolecular crowding in cells? We performed extensive computer simulations to elucidate the protein target search on DNA. The novel feature is the viscoelastic non-Brownian protein bulk diffusion recently observed experimentally. We examine the influence of the protein-DNA binding affinity and the anomalous diffusion exponent on the target search time. In all cases an optimal search time is found. The relative contribution of intermittent three-dimensional bulk diffusion and one-dimensional sliding of proteins along the DNA is quantified. Our results are discussed in the light of recent single molecule tracking experiments, aiming at a better understanding of the influence of anomalous kinetics of proteins on the facilitated diffusion mechanism.
Multiple mechanisms of serotonin 5-HT2 receptor desensitization.
Rahman, S; Neuman, R S
1993-07-20
Desensitization of serotonin 5-HT2 receptor-mediated enhancement of the N-methyl-D-aspartate (NMDA) depolarization was studied in rat cortical neurons. Serotonin and (+/-)-1-(2,5-dimethoxy-4-iodophenyl)-2-aminopropane (DOI) induced long term desensitization. Staurosporine, a nonspecific protein kinase C inhibitor, potentiated the serotonin and DOI facilitation, suggesting acute desensitization was operative. In the case of DOI, long term desensitization was prevented by staurosporine. Activators of protein kinase C abolished the serotonin facilitation, an action prevented by staurosporine. Concanavalin A potentiated the facilitation at 100 microM, but not 30 microM serotonin, suggesting these receptors undergo dose dependent internalization. Calmodulin antagonists prevent long term desensitization induced by serotonin. The depolarization induced by NMDA alone was not altered by staurosporine, protein kinase C activators, concanavalin A or calmodulin antagonists. Serotonin at 100 microM, but not 30 microM, induced heterologous desensitization of phenylephrine and carbachol induced facilitation of the NMDA depolarization. We conclude that serotonin 5-HT2 receptors both induce and undergo several forms of desensitization.
Amber Plug-In for Protein Shop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oliva, Ricardo
2004-05-10
The Amber Plug-in for ProteinShop has two main components: an AmberEngine library to compute the protein energy models, and a module to solve the energy minimization problem using an optimization algorithm in the OPTI-+ library. Together, these components allow the visualization of the protein folding process in ProteinShop. AmberEngine is a object-oriented library to compute molecular energies based on the Amber model. The main class is called ProteinEnergy. Its main interface methods are (1) "init" to initialize internal variables needed to compute the energy. (2) "eval" to evaluate the total energy given a vector of coordinates. Additional methods allow themore » user to evaluate the individual components of the energy model (bond, angle, dihedral, non-bonded-1-4, and non-bonded energies) and to obtain the energy of each individual atom. The Amber Engine library source code includes examples and test routines that illustrate the use of the library in stand alone programs. The energy minimization module uses the AmberEngine library and the nonlinear optimization library OPT++. OPT++ is open source software available under the GNU Lesser General Public License. The minimization module currently makes use of the LBFGS optimization algorithm in OPT++ to perform the energy minimization. Future releases may give the user a choice of other algorithms available in OPT++.« less
NASA Astrophysics Data System (ADS)
Camilloni, Carlo; Broglia, Ricardo A.; Tiana, Guido
2011-01-01
The study of the mechanism which is at the basis of the phenomenon of protein folding requires the knowledge of multiple folding trajectories under biological conditions. Using a biasing molecular-dynamics algorithm based on the physics of the ratchet-and-pawl system, we carry out all-atom, explicit solvent simulations of the sequence of folding events which proteins G, CI2, and ACBP undergo in evolving from the denatured to the folded state. Starting from highly disordered conformations, the algorithm allows the proteins to reach, at the price of a modest computational effort, nativelike conformations, within a root mean square deviation (RMSD) of approximately 1 Å. A scheme is developed to extract, from the myriad of events, information concerning the sequence of native contact formation and of their eventual correlation. Such an analysis indicates that all the studied proteins fold hierarchically, through pathways which, although not deterministic, are well-defined with respect to the order of contact formation. The algorithm also allows one to study unfolding, a process which looks, to a large extent, like the reverse of the major folding pathway. This is also true in situations in which many pathways contribute to the folding process, like in the case of protein G.
Marshall, Jamie L.; Kwok, Yukwah; McMorran, Brian; Baum, Linda G.; Crosbie-Watson, Rachelle H.
2013-01-01
Three adhesion complexes span the sarcolemma and facilitate critical connections between the extracellular matrix and the actin cytoskeleton: the dystrophin- and utrophin-glycoprotein complexes and α7β1 integrin. Loss of individual protein components results in a loss of the entire protein complex and muscular dystrophy. Muscular dystrophy is a progressive, lethal wasting disease characterized by repetitive cycles of myofiber degeneration and regeneration. Protein replacement therapy offers a promising approach for the treatment of muscular dystrophy. Recently, we demonstrated that sarcospan facilitates protein-protein interactions amongst the adhesion complexes and is an important therapeutic target. Here, we review current protein replacement strategies, discuss the potential benefits of sarcospan expression, and identify important experiments that must be addressed for sarcospan to move to the clinic. PMID:23601082
Direct Calculation of Protein Fitness Landscapes through Computational Protein Design
Au, Loretta; Green, David F.
2016-01-01
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A∗ search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones. PMID:26745411
Yeh, Chun-Ting; Brunette, T J; Baker, David; McIntosh-Smith, Simon; Parmeggiani, Fabio
2018-02-01
Computational protein design methods have enabled the design of novel protein structures, but they are often still limited to small proteins and symmetric systems. To expand the size of designable proteins while controlling the overall structure, we developed Elfin, a genetic algorithm for the design of novel proteins with custom shapes using structural building blocks derived from experimentally verified repeat proteins. By combining building blocks with compatible interfaces, it is possible to rapidly build non-symmetric large structures (>1000 amino acids) that match three-dimensional geometric descriptions provided by the user. A run time of about 20min on a laptop computer for a 3000 amino acid structure makes Elfin accessible to users with limited computational resources. Protein structures with controlled geometry will allow the systematic study of the effect of spatial arrangement of enzymes and signaling molecules, and provide new scaffolds for functional nanomaterials. Copyright © 2017 Elsevier Inc. All rights reserved.
Predicting protein-protein interactions from protein domains using a set cover approach.
Huang, Chengbang; Morcos, Faruck; Kanaan, Simon P; Wuchty, Stefan; Chen, Danny Z; Izaguirre, Jesús A
2007-01-01
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi.cse.nd.edu.
Algorithm guided outlining of 105 pancreatic cancer liver metastases in Ultrasound.
Hann, Alexander; Bettac, Lucas; Haenle, Mark M; Graeter, Tilmann; Berger, Andreas W; Dreyhaupt, Jens; Schmalstieg, Dieter; Zoller, Wolfram G; Egger, Jan
2017-10-06
Manual segmentation of hepatic metastases in ultrasound images acquired from patients suffering from pancreatic cancer is common practice. Semiautomatic measurements promising assistance in this process are often assessed using a small number of lesions performed by examiners who already know the algorithm. In this work, we present the application of an algorithm for the segmentation of liver metastases due to pancreatic cancer using a set of 105 different images of metastases. The algorithm and the two examiners had never assessed the images before. The examiners first performed a manual segmentation and, after five weeks, a semiautomatic segmentation using the algorithm. They were satisfied in up to 90% of the cases with the semiautomatic segmentation results. Using the algorithm was significantly faster and resulted in a median Dice similarity score of over 80%. Estimation of the inter-operator variability by using the intra class correlation coefficient was good with 0.8. In conclusion, the algorithm facilitates fast and accurate segmentation of liver metastases, comparable to the current gold standard of manual segmentation.
A Novel Latin Hypercube Algorithm via Translational Propagation
Pan, Guang; Ye, Pengcheng
2014-01-01
Metamodels have been widely used in engineering design to facilitate analysis and optimization of complex systems that involve computationally expensive simulation programs. The accuracy of metamodels is directly related to the experimental designs used. Optimal Latin hypercube designs are frequently used and have been shown to have good space-filling and projective properties. However, the high cost in constructing them limits their use. In this paper, a methodology for creating novel Latin hypercube designs via translational propagation and successive local enumeration algorithm (TPSLE) is developed without using formal optimization. TPSLE algorithm is based on the inspiration that a near optimal Latin Hypercube design can be constructed by a simple initial block with a few points generated by algorithm SLE as a building block. In fact, TPSLE algorithm offers a balanced trade-off between the efficiency and sampling performance. The proposed algorithm is compared to two existing algorithms and is found to be much more efficient in terms of the computation time and has acceptable space-filling and projective properties. PMID:25276844
BLOC-1 Interacts with BLOC-2 and the AP-3 Complex to Facilitate Protein Trafficking on Endosomes
Di Pietro, Santiago M.; Falcón-Pérez, Juan M.; Tenza, Danièle; Setty, Subba R.G.; Marks, Michael S.; Raposo, Graça
2006-01-01
The adaptor protein (AP)-3 complex is a component of the cellular machinery that controls protein sorting from endosomes to lysosomes and specialized related organelles such as melanosomes. Mutations in an AP-3 subunit underlie a form of Hermansky-Pudlak syndrome (HPS), a disorder characterized by abnormalities in lysosome-related organelles. HPS in humans can also be caused by mutations in genes encoding subunits of three complexes of unclear function, named biogenesis of lysosome-related organelles complex (BLOC)-1, -2, and -3. Here, we report that BLOC-1 interacts physically and functionally with AP-3 to facilitate the trafficking of a known AP-3 cargo, CD63, and of tyrosinase-related protein 1 (Tyrp1), a melanosomal membrane protein previously thought to traffic only independently of AP-3. BLOC-1 also interacts with BLOC-2 to facilitate Tyrp1 trafficking by a mechanism apparently independent of AP-3 function. Both BLOC-1 and -2 localize mainly to early endosome-associated tubules as determined by immunoelectron microscopy. These findings support the idea that BLOC-1 and -2 represent hitherto unknown components of the endosomal protein trafficking machinery. PMID:16837549
AlignNemo: a local network alignment method to integrate homology and topology.
Ciriello, Giovanni; Mina, Marco; Guzzi, Pietro H; Cannataro, Mario; Guerra, Concettina
2012-01-01
Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.
Semantic similarity analysis of protein data: assessment with biological features and issues.
Guzzi, Pietro H; Mina, Marco; Guerra, Concettina; Cannataro, Mario
2012-09-01
The integration of proteomics data with biological knowledge is a recent trend in bioinformatics. A lot of biological information is available and is spread on different sources and encoded in different ontologies (e.g. Gene Ontology). Annotating existing protein data with biological information may enable the use (and the development) of algorithms that use biological ontologies as framework to mine annotated data. Recently many methodologies and algorithms that use ontologies to extract knowledge from data, as well as to analyse ontologies themselves have been proposed and applied to other fields. Conversely, the use of such annotations for the analysis of protein data is a relatively novel research area that is currently becoming more and more central in research. Existing approaches span from the definition of the similarity among genes and proteins on the basis of the annotating terms, to the definition of novel algorithms that use such similarities for mining protein data on a proteome-wide scale. This work, after the definition of main concept of such analysis, presents a systematic discussion and comparison of main approaches. Finally, remaining challenges, as well as possible future directions of research are presented.
Leung, Kin K.; Hause, Ronald J.; Barkinge, John L.; Ciaccio, Mark F.; Chuu, Chih-Pin; Jones, Richard B.
2014-01-01
Many human diseases are associated with aberrant regulation of phosphoprotein signaling networks. Src homology 2 (SH2) domains represent the major class of protein domains in metazoans that interact with proteins phosphorylated on the amino acid residue tyrosine. Although current SH2 domain prediction algorithms perform well at predicting the sequences of phosphorylated peptides that are likely to result in the highest possible interaction affinity in the context of random peptide library screens, these algorithms do poorly at predicting the interaction potential of SH2 domains with physiologically derived protein sequences. We employed a high throughput interaction assay system to empirically determine the affinity between 93 human SH2 domains and phosphopeptides abstracted from several receptor tyrosine kinases and signaling proteins. The resulting interaction experiments revealed over 1000 novel peptide-protein interactions and provided a glimpse into the common and specific interaction potentials of c-Met, c-Kit, GAB1, and the human androgen receptor. We used these data to build a permutation-based logistic regression classifier that performed considerably better than existing algorithms for predicting the interaction potential of several SH2 domains. PMID:24728074
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zemla, A; Lang, D; Kostova, T
2010-11-29
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
Kuttner, Yosef Y; Engel, Stanislav
2018-02-01
A rational design of protein complexes with defined functionalities and of drugs aimed at disrupting protein-protein interactions requires fundamental understanding of the mechanisms underlying the formation of specific protein complexes. Efforts to develop efficient small-molecule or protein-based binders often exploit energetic hot spots on protein surfaces, namely, the interfacial residues that provide most of the binding free energy in the complex. The molecular basis underlying the unusually high energy contribution of the hot spots remains obscure, and its elucidation would facilitate the design of interface-targeted drugs. To study the nature of the energetic hot spots, we analyzed the backbone dynamic properties of contact surfaces in several protein complexes. We demonstrate that, in most complexes, the backbone dynamic landscapes of interacting surfaces form complementary "stability patches," in which static areas from the opposing surfaces superimpose, and that these areas are predominantly located near the geometric center of the interface. We propose that a diminished enthalpy-entropy compensation effect augments the degree to which residues positioned within the complementary stability patches contribute to complex affinity, thereby giving rise to the energetic hot spots. These findings offer new insights into the nature of energetic hot spots and the role that backbone dynamics play in facilitating intermolecular recognition. Mapping the interfacial stability patches may provide guidance for protein engineering approaches aimed at improving the stability of protein complexes and could facilitate the design of ligands that target complex interfaces. © 2017 Wiley Periodicals, Inc.
Blank-Landeshammer, Bernhard; Kollipara, Laxmikanth; Biß, Karsten; Pfenninger, Markus; Malchow, Sebastian; Shuvaev, Konstantin; Zahedi, René P; Sickmann, Albert
2017-09-01
Complex mass spectrometry based proteomics data sets are mostly analyzed by protein database searches. While this approach performs considerably well for sequenced organisms, direct inference of peptide sequences from tandem mass spectra, i.e., de novo peptide sequencing, oftentimes is the only way to obtain information when protein databases are absent. However, available algorithms suffer from drawbacks such as lack of validation and often high rates of false positive hits (FP). Here we present a simple method of combining results from commonly available de novo peptide sequencing algorithms, which in conjunction with minor tweaks in data acquisition ensues lower empirical FDR compared to the analysis using single algorithms. Results were validated using state-of-the art database search algorithms as well specifically synthesized reference peptides. Thus, we could increase the number of PSMs meeting a stringent FDR of 5% more than 3-fold compared to the single best de novo sequencing algorithm alone, accounting for an average of 11 120 PSMs (combined) instead of 3476 PSMs (alone) in triplicate 2 h LC-MS runs of tryptic HeLa digestion.
Recent progress and future directions in protein-protein docking.
Ritchie, David W
2008-02-01
This article gives an overview of recent progress in protein-protein docking and it identifies several directions for future research. Recent results from the CAPRI blind docking experiments show that docking algorithms are steadily improving in both reliability and accuracy. Current docking algorithms employ a range of efficient search and scoring strategies, including e.g. fast Fourier transform correlations, geometric hashing, and Monte Carlo techniques. These approaches can often produce a relatively small list of up to a few thousand orientations, amongst which a near-native binding mode is often observed. However, despite the use of improved scoring functions which typically include models of desolvation, hydrophobicity, and electrostatics, current algorithms still have difficulty in identifying the correct solution from the list of false positives, or decoys. Nonetheless, significant progress is being made through better use of bioinformatics, biochemical, and biophysical information such as e.g. sequence conservation analysis, protein interaction databases, alanine scanning, and NMR residual dipolar coupling restraints to help identify key binding residues. Promising new approaches to incorporate models of protein flexibility during docking are being developed, including the use of molecular dynamics snapshots, rotameric and off-rotamer searches, internal coordinate mechanics, and principal component analysis based techniques. Some investigators now use explicit solvent models in their docking protocols. Many of these approaches can be computationally intensive, although new silicon chip technologies such as programmable graphics processor units are beginning to offer competitive alternatives to conventional high performance computer systems. As cryo-EM techniques improve apace, docking NMR and X-ray protein structures into low resolution EM density maps is helping to bridge the resolution gap between these complementary techniques. The use of symmetry and fragment assembly constraints are also helping to make possible docking-based predictions of large multimeric protein complexes. In the near future, the closer integration of docking algorithms with protein interface prediction software, structural databases, and sequence analysis techniques should help produce better predictions of protein interaction networks and more accurate structural models of the fundamental molecular interactions within the cell.
NASA Astrophysics Data System (ADS)
Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen
Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.
Wang, LiQiang; Li, CuiFeng
2014-10-01
A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.
Benchmarking Commercial Conformer Ensemble Generators.
Friedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, Johannes
2017-11-27
We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Guided filtering for solar image/video processing
NASA Astrophysics Data System (ADS)
Xu, Long; Yan, Yihua; Cheng, Jun
2017-06-01
A new image enhancement algorithm employing guided filtering is proposed in this work for the enhancement of solar images and videos so that users can easily figure out important fine structures embedded in the recorded images/movies for solar observation. The proposed algorithm can efficiently remove image noises, including Gaussian and impulse noises. Meanwhile, it can further highlight fibrous structures on/beyond the solar disk. These fibrous structures can clearly demonstrate the progress of solar flare, prominence coronal mass emission, magnetic field, and so on. The experimental results prove that the proposed algorithm gives significant enhancement of visual quality of solar images beyond original input and several classical image enhancement algorithms, thus facilitating easier determination of interesting solar burst activities from recorded images/movies.
Hus, Vanessa; Lord, Catherine
2014-01-01
The Autism Diagnostic Observation Schedule, 2nd Edition includes revised diagnostic algorithms and standardized severity scores for modules used to assess children and adolescents of varying language abilities. Comparable revisions have not yet been applied to the Module 4, used with verbally fluent adults. The current study revises the Module 4 algorithm and calibrates raw overall and domain totals to provide metrics of ASD symptom severity. Sensitivity and specificity of the revised Module 4 algorithm exceeded 80% in the overall sample. Module 4 calibrated severity scores provide quantitative estimates of ASD symptom severity that are relatively independent of participant characteristics. These efforts increase comparability of ADOS scores across modules and should facilitate efforts to increase understanding of adults with ASD. PMID:24590409
Gradient gravitational search: An efficient metaheuristic algorithm for global optimization.
Dash, Tirtharaj; Sahu, Prabhat K
2015-05-30
The adaptation of novel techniques developed in the field of computational chemistry to solve the concerned problems for large and flexible molecules is taking the center stage with regard to efficient algorithm, computational cost and accuracy. In this article, the gradient-based gravitational search (GGS) algorithm, using analytical gradients for a fast minimization to the next local minimum has been reported. Its efficiency as metaheuristic approach has also been compared with Gradient Tabu Search and others like: Gravitational Search, Cuckoo Search, and Back Tracking Search algorithms for global optimization. Moreover, the GGS approach has also been applied to computational chemistry problems for finding the minimal value potential energy of two-dimensional and three-dimensional off-lattice protein models. The simulation results reveal the relative stability and physical accuracy of protein models with efficient computational cost. © 2015 Wiley Periodicals, Inc.
Petitjean, Michel
2017-10-01
Some major proteins families, such as carbonic anhydrases (CAs), have a conical cavity at the active site. No algorithm was available to compute conical cavities, so we needed to design one. The fast algorithm we designed let us show on a set of 717 CAs extracted from the PDB database that γ-CAs are characterized by active site cavity cone angles significantly larger than those of α-CAs and β-CAs: the generatrix-axis angles are greater than 60° for the γ-CAs while they are smaller than 50° for the other CAs. Free binaries of the CONICA software implementing the algorithm are available through a software repository at http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Eyler, Lauren; Hubbard, Alan; Juillard, Catherine
2016-10-01
Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Filli, Lukas; Marcon, Magda; Scholz, Bernhard; Calcagni, Maurizio; Finkenstädt, Tim; Andreisek, Gustav; Guggenberger, Roman
2014-12-01
The aim of this study was to evaluate a prototype correction algorithm to reduce metal artefacts in flat detector computed tomography (FDCT) of scaphoid fixation screws. FDCT has gained interest in imaging small anatomic structures of the appendicular skeleton. Angiographic C-arm systems with flat detectors allow fluoroscopy and FDCT imaging in a one-stop procedure emphasizing their role as an ideal intraoperative imaging tool. However, FDCT imaging can be significantly impaired by artefacts induced by fixation screws. Following ethical board approval, commercially available scaphoid fixation screws were inserted into six cadaveric specimens in order to fix artificially induced scaphoid fractures. FDCT images corrected with the algorithm were compared to uncorrected images both quantitatively and qualitatively by two independent radiologists in terms of artefacts, screw contour, fracture line visibility, bone visibility, and soft tissue definition. Normal distribution of variables was evaluated using the Kolmogorov-Smirnov test. In case of normal distribution, quantitative variables were compared using paired Student's t tests. The Wilcoxon signed-rank test was used for quantitative variables without normal distribution and all qualitative variables. A p value of < 0.05 was considered to indicate statistically significant differences. Metal artefacts were significantly reduced by the correction algorithm (p < 0.001), and the fracture line was more clearly defined (p < 0.01). The inter-observer reliability was "almost perfect" (intra-class correlation coefficient 0.85, p < 0.001). The prototype correction algorithm in FDCT for metal artefacts induced by scaphoid fixation screws may facilitate intra- and postoperative follow-up imaging. Flat detector computed tomography (FDCT) is a helpful imaging tool for scaphoid fixation. The correction algorithm significantly reduces artefacts in FDCT induced by scaphoid fixation screws. This may facilitate intra- and postoperative follow-up imaging.
Implications of Mycobacterium Major Facilitator Superfamily for Novel Measures against Tuberculosis.
Wang, Rui; Zhang, Zhen; Xie, Longxiang; Xie, Jianping
2015-01-01
Major facilitator superfamily (MFS) is an important secondary membrane transport protein superfamily conserved from prokaryotes to eukaryotes. The MFS proteins are widespread among bacteria and are responsible for the transfer of substrates. Pathogenic Mycobacterium MFS transporters, their distribution, function, phylogeny, and predicted crystal structures were studied to better understand the function of MFS and to discover specific inhibitors of MFS for better tuberculosis control.
Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan
2018-04-01
Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
2017-12-06
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Schmitter, Daniel; Wachowicz, Paulina; Sage, Daniel; Chasapi, Anastasia; Xenarios, Ioannis; Simanis; Unser, Michael
2013-01-01
The yeast Schizosaccharomyces pombe is frequently used as a model for studying the cell cycle. The cells are rod-shaped and divide by medial fission. The process of cell division, or cytokinesis, is controlled by a network of signaling proteins called the Septation Initiation Network (SIN); SIN proteins associate with the SPBs during nuclear division (mitosis). Some SIN proteins associate with both SPBs early in mitosis, and then display strongly asymmetric signal intensity at the SPBs in late mitosis, just before cytokinesis. This asymmetry is thought to be important for correct regulation of SIN signaling, and coordination of cytokinesis and mitosis. In order to study the dynamics of organelles or large protein complexes such as the spindle pole body (SPB), which have been labeled with a fluorescent protein tag in living cells, a number of the image analysis problems must be solved; the cell outline must be detected automatically, and the position and signal intensity associated with the structures of interest within the cell must be determined. We present a new 2D and 3D image analysis system that permits versatile and robust analysis of motile, fluorescently labeled structures in rod-shaped cells. We have designed an image analysis system that we have implemented as a user-friendly software package allowing the fast and robust image-analysis of large numbers of rod-shaped cells. We have developed new robust algorithms, which we combined with existing methodologies to facilitate fast and accurate analysis. Our software permits the detection and segmentation of rod-shaped cells in either static or dynamic (i.e. time lapse) multi-channel images. It enables tracking of two structures (for example SPBs) in two different image channels. For 2D or 3D static images, the locations of the structures are identified, and then intensity values are extracted together with several quantitative parameters, such as length, width, cell orientation, background fluorescence and the distance between the structures of interest. Furthermore, two kinds of kymographs of the tracked structures can be established, one representing the migration with respect to their relative position, the other representing their individual trajectories inside the cell. This software package, called "RodCellJ", allowed us to analyze a large number of S. pombe cells to understand the rules that govern SIN protein asymmetry. (Continued on next page) (Continued from previous page). "RodCellJ" is freely available to the community as a package of several ImageJ plugins to simultaneously analyze the behavior of a large number of rod-shaped cells in an extensive manner. The integration of different image-processing techniques in a single package, as well as the development of novel algorithms does not only allow to speed up the analysis with respect to the usage of existing tools, but also accounts for higher accuracy. Its utility was demonstrated on both 2D and 3D static and dynamic images to study the septation initiation network of the yeast Schizosaccharomyces pombe. More generally, it can be used in any kind of biological context where fluorescent-protein labeled structures need to be analyzed in rod-shaped cells. RodCellJ is freely available under http://bigwww.epfl.ch/algorithms.html.
Flexible methods for segmentation evaluation: results from CT-based luggage screening.
Karimi, Seemeen; Jiang, Xiaoqian; Cosman, Pamela; Martz, Harry
2014-01-01
Imaging systems used in aviation security include segmentation algorithms in an automatic threat recognition pipeline. The segmentation algorithms evolve in response to emerging threats and changing performance requirements. Analysis of segmentation algorithms' behavior, including the nature of errors and feature recovery, facilitates their development. However, evaluation methods from the literature provide limited characterization of the segmentation algorithms. To develop segmentation evaluation methods that measure systematic errors such as oversegmentation and undersegmentation, outliers, and overall errors. The methods must measure feature recovery and allow us to prioritize segments. We developed two complementary evaluation methods using statistical techniques and information theory. We also created a semi-automatic method to define ground truth from 3D images. We applied our methods to evaluate five segmentation algorithms developed for CT luggage screening. We validated our methods with synthetic problems and an observer evaluation. Both methods selected the same best segmentation algorithm. Human evaluation confirmed the findings. The measurement of systematic errors and prioritization helped in understanding the behavior of each segmentation algorithm. Our evaluation methods allow us to measure and explain the accuracy of segmentation algorithms.
Finding undetected protein associations in cell signaling by belief propagation.
Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R
2011-01-11
External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.
QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families
Gudyś, Adam; Deorowicz, Sebastian
2017-01-01
The ever-increasing size of sequence databases caused by the development of high throughput sequencing, poses to multiple alignment algorithms one of the greatest challenges yet. As we show, well-established techniques employed for increasing alignment quality, i.e., refinement and consistency, are ineffective when large protein families are investigated. We present QuickProbs 2, an algorithm for multiple sequence alignment. Based on probabilistic models, equipped with novel column-oriented refinement and selective consistency, it offers outstanding accuracy. When analysing hundreds of sequences, Quick-Probs 2 is noticeably better than ClustalΩ and MAFFT, the previous leaders for processing numerous protein families. In the case of smaller sets, for which consistency-based methods are the best performing, QuickProbs 2 is also superior to the competitors. Due to low computational requirements of selective consistency and utilization of massively parallel architectures, presented algorithm has similar execution times to ClustalΩ, and is orders of magnitude faster than full consistency approaches, like MSAProbs or PicXAA. All these make QuickProbs 2 an excellent tool for aligning families ranging from few, to hundreds of proteins. PMID:28139687
NASA Astrophysics Data System (ADS)
Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.
2017-07-01
Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
KDE Bioscience: platform for bioinformatics analysis workflows.
Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue
2006-08-01
Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
BioWord: A sequence manipulation suite for Microsoft Word
2012-01-01
Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326
BioWord: a sequence manipulation suite for Microsoft Word.
Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan
2012-06-07
The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-09-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-01-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
GPS-Lipid: a robust tool for the prediction of multiple lipid modification sites.
Xie, Yubin; Zheng, Yueyuan; Li, Hongyu; Luo, Xiaotong; He, Zhihao; Cao, Shuo; Shi, Yi; Zhao, Qi; Xue, Yu; Zuo, Zhixiang; Ren, Jian
2016-06-16
As one of the most common post-translational modifications in eukaryotic cells, lipid modification is an important mechanism for the regulation of variety aspects of protein function. Over the last decades, three classes of lipid modifications have been increasingly studied. The co-regulation of these different lipid modifications is beginning to be noticed. However, due to the lack of integrated bioinformatics resources, the studies of co-regulatory mechanisms are still very limited. In this work, we developed a tool called GPS-Lipid for the prediction of four classes of lipid modifications by integrating the Particle Swarm Optimization with an aging leader and challengers (ALC-PSO) algorithm. GPS-Lipid was proven to be evidently superior to other similar tools. To facilitate the research of lipid modification, we hosted a publicly available web server at http://lipid.biocuckoo.org with not only the implementation of GPS-Lipid, but also an integrative database and visualization tool. We performed a systematic analysis of the co-regulatory mechanism between different lipid modifications with GPS-Lipid. The results demonstrated that the proximal dual-lipid modifications among palmitoylation, myristoylation and prenylation are key mechanism for regulating various protein functions. In conclusion, GPS-lipid is expected to serve as useful resource for the research on lipid modifications, especially on their co-regulation.
Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering.
Jain, Abhinav; Stock, Gerhard
2014-07-17
Adopting extensive molecular dynamics simulations of villin headpiece protein (HP35) by Shaw and co-workers, a detailed theoretical analysis of the folding of HP35 is presented. The approach is based on the recently proposed most probable path algorithm which identifies the metastable states of the system, combined with dynamical coring of these states in order to obtain a consistent Markov state model. The method facilitates the construction of a dendrogram associated with the folding free-energy landscape of HP35, which reveals a hierarchical funnel structure and shows that the native state is rather a kinetic trap than a network hub. The energy landscape of HP35 consists of the entropic unfolded basin U, where the prestructuring of the protein takes place, the intermediate basin I, which is connected to U via the rate-limiting U → I transition state reflecting the formation of helix-1, and the native basin N, containing a state close to the NMR structure and a native-like state that exhibits enhanced fluctuations of helix-3. The model is in line with recent experimental observations that the intermediate and native states differ mostly in their dynamics (locked vs unlocked states). Employing dihedral angle principal component analysis, subdiffusive motion on a multidimensional free-energy surface is found.
Marshall, Jamie L; Kwok, Yukwah; McMorran, Brian J; Baum, Linda G; Crosbie-Watson, Rachelle H
2013-09-01
Three adhesion complexes span the sarcolemma and facilitate critical connections between the extracellular matrix and the actin cytoskeleton: the dystrophin- and utrophin-glycoprotein complexes and α7β1 integrin. Loss of individual protein components results in a loss of the entire protein complex and muscular dystrophy. Muscular dystrophy is a progressive, lethal wasting disease characterized by repetitive cycles of myofiber degeneration and regeneration. Protein-replacement therapy offers a promising approach for the treatment of muscular dystrophy. Recently, we demonstrated that sarcospan facilitates protein-protein interactions amongst the adhesion complexes and is an important potential therapeutic target. Here, we review current protein-replacement strategies, discuss the potential benefits of sarcospan expression, and identify important experiments that must be addressed for sarcospan to move to the clinic. © 2013 FEBS.
Chandrasekaran, Srinivas Niranj; Das, Jhuma; Dokholyan, Nikolay V.; Carter, Charles W.
2016-01-01
PATH rapidly computes a path and a transition state between crystal structures by minimizing the Onsager-Machlup action. It requires input parameters whose range of values can generate different transition-state structures that cannot be uniquely compared with those generated by other methods. We outline modifications to estimate these input parameters to circumvent these difficulties and validate the PATH transition states by showing consistency between transition-states derived by different algorithms for unrelated protein systems. Although functional protein conformational change trajectories are to a degree stochastic, they nonetheless pass through a well-defined transition state whose detailed structural properties can rapidly be identified using PATH. PMID:26958584
We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.
VASP-E: Specificity Annotation with a Volumetric Analysis of Electrostatic Isopotentials
Chen, Brian Y.
2014-01-01
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. PMID:25166865
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xi, T; Jones, I M; Mohrenweiser, H W
2003-11-03
Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less
Zhang, Jian; Suo, Yan; Liu, Min; Xu, Xun
2018-06-01
Proliferative diabetic retinopathy (PDR) is one of the most common complications of diabetes and can lead to blindness. Proteomic studies have provided insight into the pathogenesis of PDR and a series of PDR-related genes has been identified but are far from fully characterized because the experimental methods are expensive and time consuming. In our previous study, we successfully identified 35 candidate PDR-related genes through the shortest-path algorithm. In the current study, we developed a computational method using the random walk with restart (RWR) algorithm and the protein-protein interaction (PPI) network to identify potential PDR-related genes. After some possible genes were obtained by the RWR algorithm, a three-stage filtration strategy, which includes the permutation test, interaction test and enrichment test, was applied to exclude potential false positives caused by the structure of PPI network, the poor interaction strength, and the limited similarity on gene ontology (GO) terms and biological pathways. As a result, 36 candidate genes were discovered by the method which was different from the 35 genes reported in our previous study. A literature review showed that 21 of these 36 genes are supported by previous experiments. These findings suggest the robustness and complementary effects of both our efforts using different computational methods, thus providing an alternative method to study PDR pathogenesis. Copyright © 2017 Elsevier B.V. All rights reserved.
2014-01-01
Background Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. In this paper, we propose a new re-ranking technique using a new energy-based scoring function, namely IFACEwat - a combined Interface Atomic Contact Energy (IFACE) and water effect. The IFACEwat aims to further improve the discrimination of the near-native structures of the initial rigid docking algorithm ZDOCK3.0.2. Unlike other re-ranking techniques, the IFACEwat explicitly implements interfacial water into the protein interfaces to account for the water-mediated contacts during the protein interactions. Results Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. Conclusions With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms. PMID:25521441
Mission Analysis Program for Solar Electric Propulsion (MAPSEP). Volume 3: Program manual
NASA Technical Reports Server (NTRS)
Huling, K. R.; Boain, R. J.; Wilson, T.; Hong, P. E.; Shults, G. L.
1974-01-01
The internal structure of MAPSEP is described. Topics discussed include: macrologic, variable definition, subroutines, and logical flow. Information is given to facilitate modifications to the models and algorithms of MAPSEP.
Automatic protein structure solution from weak X-ray data
NASA Astrophysics Data System (ADS)
Skubák, Pavol; Pannu, Navraj S.
2013-11-01
Determining new protein structures from X-ray diffraction data at low resolution or with a weak anomalous signal is a difficult and often an impossible task. Here we propose a multivariate algorithm that simultaneously combines the structure determination steps. In tests on over 140 real data sets from the protein data bank, we show that this combined approach can automatically build models where current algorithms fail, including an anisotropically diffracting 3.88 Å RNA polymerase II data set. The method seamlessly automates the process, is ideal for non-specialists and provides a mathematical framework for successfully combining various sources of information in image processing.
Functional assignment to JEV proteins using SVM.
Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep
2008-01-01
Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP).
Functional assignment to JEV proteins using SVM
Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep
2008-01-01
Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP). PMID:19052658
Space vehicle Viterbi decoder. [data converters, algorithms
NASA Technical Reports Server (NTRS)
1975-01-01
The design and fabrication of an extremely low-power, constraint-length 7, rate 1/3 Viterbi decoder brassboard capable of operating at information rates of up to 100 kb/s is presented. The brassboard is partitioned to facilitate a later transition to an LSI version requiring even less power. The effect of soft-decision thresholds, path memory lengths, and output selection algorithms on the bit error rate is evaluated. A branch synchronization algorithm is compared with a more conventional approach. The implementation of the decoder and its test set (including all-digital noise source) are described along with the results of various system tests and evaluations. Results and recommendations are presented.
TORC3: Token-ring clearing heuristic for currency circulation
NASA Astrophysics Data System (ADS)
Humes, Carlos, Jr.; Lauretto, Marcelo S.; Nakano, Fábio; Pereira, Carlos A. B.; Rafare, Guilherme F. G.; Stern, Julio Michael
2012-10-01
Clearing algorithms are at the core of modern payment systems, facilitating the settling of multilateral credit messages with (near) minimum transfers of currency. Traditional clearing procedures use batch processing based on MILP - mixed-integer linear programming algorithms. The MILP approach demands intensive computational resources; moreover, it is also vulnerable to operational risks generated by possible defaults during the inter-batch period. This paper presents TORC3 - the Token-Ring Clearing Algorithm for Currency Circulation. In contrast to the MILP approach, TORC3 is a real time heuristic procedure, demanding modest computational resources, and able to completely shield the clearing operation against the participating agents' risk of default.
Moore, J H
1995-06-01
A genetic algorithm for instrumentation control and optimization was developed using the LabVIEW graphical programming environment. The usefulness of this methodology for the optimization of a closed loop control instrument is demonstrated with minimal complexity and the programming is presented in detail to facilitate its adaptation to other LabVIEW applications. Closed loop control instruments have variety of applications in the biomedical sciences including the regulation of physiological processes such as blood pressure. The program presented here should provide a useful starting point for those wishing to incorporate genetic algorithm approaches to LabVIEW mediated optimization of closed loop control instruments.
Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.
2010-01-01
A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
Parente, Daniel J; Ray, J Christian J; Swint-Kruse, Liskin
2015-12-01
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. © 2015 Wiley Periodicals, Inc.
A hybrid MD-kMC algorithm for folding proteins in explicit solvent.
Peter, Emanuel Karl; Shea, Joan-Emma
2014-04-14
We present a novel hybrid MD-kMC algorithm that is capable of efficiently folding proteins in explicit solvent. We apply this algorithm to the folding of a small protein, Trp-Cage. Different kMC move sets that capture different possible rate limiting steps are implemented. The first uses secondary structure formation as a relevant rate event (a combination of dihedral rotations and hydrogen-bonding formation and breakage). The second uses tertiary structure formation events through formation of contacts via translational moves. Both methods fold the protein, but via different mechanisms and with different folding kinetics. The first method leads to folding via a structured helical state, with kinetics fit by a single exponential. The second method leads to folding via a collapsed loop, with kinetics poorly fit by single or double exponentials. In both cases, folding times are faster than experimentally reported values, The secondary and tertiary move sets are integrated in a third MD-kMC implementation, which now leads to folding of the protein via both pathways, with single and double-exponential fits to the rates, and to folding rates in good agreement with experimental values. The competition between secondary and tertiary structure leads to a longer search for the helix-rich intermediate in the case of the first pathway, and to the emergence of a kinetically trapped long-lived molten-globule collapsed state in the case of the second pathway. The algorithm presented not only captures experimentally observed folding intermediates and kinetics, but yields insights into the relative roles of local and global interactions in determining folding mechanisms and rates.
Efficient conformational space exploration in ab initio protein folding simulation.
Ullah, Ahammed; Ahmed, Nasif; Pappu, Subrata Dey; Shatabda, Swakkhar; Ullah, A Z M Dayem; Rahman, M Sohel
2015-08-01
Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic-polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.
Qin, J; Choi, K S; Ho, Simon S M; Heng, P A
2008-01-01
A force prediction algorithm is proposed to facilitate virtual-reality (VR) based collaborative surgical simulation by reducing the effect of network latencies. State regeneration is used to correct the estimated prediction. This algorithm is incorporated into an adaptive transmission protocol in which auxiliary features such as view synchronization and coupling control are equipped to ensure the system consistency. We implemented this protocol using multi-threaded technique on a cluster-based network architecture.
Khan, Naveed; McClean, Sally; Zhang, Shuai; Nugent, Chris
2016-01-01
In recent years, smart phones with inbuilt sensors have become popular devices to facilitate activity recognition. The sensors capture a large amount of data, containing meaningful events, in a short period of time. The change points in this data are used to specify transitions to distinct events and can be used in various scenarios such as identifying change in a patient’s vital signs in the medical domain or requesting activity labels for generating real-world labeled activity datasets. Our work focuses on change-point detection to identify a transition from one activity to another. Within this paper, we extend our previous work on multivariate exponentially weighted moving average (MEWMA) algorithm by using a genetic algorithm (GA) to identify the optimal set of parameters for online change-point detection. The proposed technique finds the maximum accuracy and F_measure by optimizing the different parameters of the MEWMA, which subsequently identifies the exact location of the change point from an existing activity to a new one. Optimal parameter selection facilitates an algorithm to detect accurate change points and minimize false alarms. Results have been evaluated based on two real datasets of accelerometer data collected from a set of different activities from two users, with a high degree of accuracy from 99.4% to 99.8% and F_measure of up to 66.7%. PMID:27792177
Multi-source Geospatial Data Analysis with Google Earth Engine
NASA Astrophysics Data System (ADS)
Erickson, T.
2014-12-01
The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org
Yao, Hai-Ping; Zhu, Zhi-Xiang; Ji, Ming; Chen, Xiao-Guang; Xu, Bai-Ling
2014-04-01
Poly(ADP-ribose) polymerase-1 (PARP-1) has emerged as a promising anticancer drug target due to its key role in the DNA repair process. It can polymerize ADP-ribose units on its substrate proteins which are involved in the regulation of DNA repair. In this work, a novel series of para-substituted 1-benzyl-quinazoline-2, 4 (1H, 3H)-diones was designed and synthesized, and the inhibitory activities against PARP-1 of compounds 7a-7e, 8a-8f, 9a-9c and 10a-10c were evaluated. Of all the tested compounds, nine compounds displayed inhibitory activities with IC50 values ranging from 4.6 to 39.2 micromol x L(-1). In order to predict the binding modes of the potent molecules, molecular docking was performed using CDOCKER algorithm, and that will facilitate to further develop more potent PARP-1 inhibitors with a quinazolinedione scaffold.
JGromacs: a Java package for analyzing protein simulations.
Münz, Márton; Biggin, Philip C
2012-01-23
In this paper, we introduce JGromacs, a Java API (Application Programming Interface) that facilitates the development of cross-platform data analysis applications for Molecular Dynamics (MD) simulations. The API supports parsing and writing file formats applied by GROMACS (GROningen MAchine for Chemical Simulations), one of the most widely used MD simulation packages. JGromacs builds on the strengths of object-oriented programming in Java by providing a multilevel object-oriented representation of simulation data to integrate and interconvert sequence, structure, and dynamics information. The easy-to-learn, easy-to-use, and easy-to-extend framework is intended to simplify and accelerate the implementation and development of complex data analysis algorithms. Furthermore, a basic analysis toolkit is included in the package. The programmer is also provided with simple tools (e.g., XML-based configuration) to create applications with a user interface resembling the command-line interface of GROMACS applications. JGromacs and detailed documentation is freely available from http://sbcb.bioch.ox.ac.uk/jgromacs under a GPLv3 license .
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
JGromacs: A Java Package for Analyzing Protein Simulations
2011-01-01
In this paper, we introduce JGromacs, a Java API (Application Programming Interface) that facilitates the development of cross-platform data analysis applications for Molecular Dynamics (MD) simulations. The API supports parsing and writing file formats applied by GROMACS (GROningen MAchine for Chemical Simulations), one of the most widely used MD simulation packages. JGromacs builds on the strengths of object-oriented programming in Java by providing a multilevel object-oriented representation of simulation data to integrate and interconvert sequence, structure, and dynamics information. The easy-to-learn, easy-to-use, and easy-to-extend framework is intended to simplify and accelerate the implementation and development of complex data analysis algorithms. Furthermore, a basic analysis toolkit is included in the package. The programmer is also provided with simple tools (e.g., XML-based configuration) to create applications with a user interface resembling the command-line interface of GROMACS applications. Availability: JGromacs and detailed documentation is freely available from http://sbcb.bioch.ox.ac.uk/jgromacs under a GPLv3 license. PMID:22191855
Topology of membrane proteins-predictions, limitations and variations.
Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne
2017-10-26
Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Image-algebraic design of multispectral target recognition algorithms
NASA Astrophysics Data System (ADS)
Schmalz, Mark S.; Ritter, Gerhard X.
1994-06-01
In this paper, we discuss methods for multispectral ATR (Automated Target Recognition) of small targets that are sensed under suboptimal conditions, such as haze, smoke, and low light levels. In particular, we discuss our ongoing development of algorithms and software that effect intelligent object recognition by selecting ATR filter parameters according to ambient conditions. Our algorithms are expressed in terms of IA (image algebra), a concise, rigorous notation that unifies linear and nonlinear mathematics in the image processing domain. IA has been implemented on a variety of parallel computers, with preprocessors available for the Ada and FORTRAN languages. An image algebra C++ class library has recently been made available. Thus, our algorithms are both feasible implementationally and portable to numerous machines. Analyses emphasize the aspects of image algebra that aid the design of multispectral vision algorithms, such as parameterized templates that facilitate the flexible specification of ATR filters.
Gu, Yuhua; Kumar, Virendra; Hall, Lawrence O; Goldgof, Dmitry B; Li, Ching-Yen; Korn, René; Bendtsen, Claus; Velazquez, Emmanuel Rios; Dekker, Andre; Aerts, Hugo; Lambin, Philippe; Li, Xiuli; Tian, Jie; Gatenby, Robert A; Gillies, Robert J
2012-01-01
A single click ensemble segmentation (SCES) approach based on an existing “Click&Grow” algorithm is presented. The SCES approach requires only one operator selected seed point as compared with multiple operator inputs, which are typically needed. This facilitates processing large numbers of cases. Evaluation on a set of 129 CT lung tumor images using a similarity index (SI) was done. The average SI is above 93% using 20 different start seeds, showing stability. The average SI for 2 different readers was 79.53%. We then compared the SCES algorithm with the two readers, the level set algorithm and the skeleton graph cut algorithm obtaining an average SI of 78.29%, 77.72%, 63.77% and 63.76% respectively. We can conclude that the newly developed automatic lung lesion segmentation algorithm is stable, accurate and automated. PMID:23459617
Crandall, Jacob W; Oudah, Mayada; Tennom; Ishowo-Oloko, Fatimah; Abdallah, Sherief; Bonnefon, Jean-François; Cebrian, Manuel; Shariff, Azim; Goodrich, Michael A; Rahwan, Iyad
2018-01-16
Since Alan Turing envisioned artificial intelligence, technical progress has often been measured by the ability to defeat humans in zero-sum encounters (e.g., Chess, Poker, or Go). Less attention has been given to scenarios in which human-machine cooperation is beneficial but non-trivial, such as scenarios in which human and machine preferences are neither fully aligned nor fully in conflict. Cooperation does not require sheer computational power, but instead is facilitated by intuition, cultural norms, emotions, signals, and pre-evolved dispositions. Here, we develop an algorithm that combines a state-of-the-art reinforcement-learning algorithm with mechanisms for signaling. We show that this algorithm can cooperate with people and other algorithms at levels that rival human cooperation in a variety of two-player repeated stochastic games. These results indicate that general human-machine cooperation is achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.
Using support vector machine to predict beta- and gamma-turns in proteins.
Hu, Xiuzhen; Li, Qianzhong
2008-09-01
By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.
Pascual-García, Alberto; Abia, David; Ortiz, Angel R; Bastolla, Ugo
2009-03-01
Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.php.
Metadata mapping and reuse in caBIG.
Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis
2009-02-05
This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories. The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository. This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
Yang, Deshan; Brame, Scott; El Naqa, Issam; Aditya, Apte; Wu, Yu; Murty Goddu, S.; Mutic, Sasa; Deasy, Joseph O.; Low, Daniel A.
2011-01-01
Purpose: Recent years have witnessed tremendous progress in image guide radiotherapy technology and a growing interest in the possibilities for adapting treatment planning and delivery over the course of treatment. One obstacle faced by the research community has been the lack of a comprehensive open-source software toolkit dedicated for adaptive radiotherapy (ART). To address this need, the authors have developed a software suite called the Deformable Image Registration and Adaptive Radiotherapy Toolkit (DIRART). Methods:DIRART is an open-source toolkit developed in MATLAB. It is designed in an object-oriented style with focus on user-friendliness, features, and flexibility. It contains four classes of DIR algorithms, including the newer inverse consistency algorithms to provide consistent displacement vector field in both directions. It also contains common ART functions, an integrated graphical user interface, a variety of visualization and image-processing features, dose metric analysis functions, and interface routines. These interface routines make DIRART a powerful complement to the Computational Environment for Radiotherapy Research (CERR) and popular image-processing toolkits such as ITK. Results: DIRART provides a set of image processing∕registration algorithms and postprocessing functions to facilitate the development and testing of DIR algorithms. It also offers a good amount of options for DIR results visualization, evaluation, and validation. Conclusions: By exchanging data with treatment planning systems via DICOM-RT files and CERR, and by bringing image registration algorithms closer to radiotherapy applications, DIRART is potentially a convenient and flexible platform that may facilitate ART and DIR research. PMID:21361176
Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu
2013-01-04
Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .
D'Onofrio, David J; Abel, David L; Johnson, Donald E
2012-03-14
The fields of molecular biology and computer science have cooperated over recent years to create a synergy between the cybernetic and biosemiotic relationship found in cellular genomics to that of information and language found in computational systems. Biological information frequently manifests its "meaning" through instruction or actual production of formal bio-function. Such information is called prescriptive information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI. This paper looks at this dichotomy as expressed in both the genetic code and in the central dogma of protein synthesis. An example of a genetic algorithm is modeled after the ribosome, and an examination of the protein synthesis process is used to differentiate PI data from PI algorithms.
Yanagisawa, Keisuke; Komine, Shunta; Kubota, Rikuto; Ohue, Masahito; Akiyama, Yutaka
2018-06-01
The need to accelerate large-scale protein-ligand docking in virtual screening against a huge compound database led researchers to propose a strategy that entails memorizing the evaluation result of the partial structure of a compound and reusing it to evaluate other compounds. However, the previous method required frequent disk accesses, resulting in insufficient acceleration. Thus, more efficient memory usage can be expected to lead to further acceleration, and optimal memory usage could be achieved by solving the minimum cost flow problem. In this research, we propose a fast algorithm for the minimum cost flow problem utilizing the characteristics of the graph generated for this problem as constraints. The proposed algorithm, which optimized memory usage, was approximately seven times faster compared to existing minimum cost flow algorithms. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Nanou, Evanthia; Sullivan, Jane M; Scheuer, Todd; Catterall, William A
2016-01-26
Short-term synaptic plasticity is induced by calcium (Ca(2+)) accumulating in presynaptic nerve terminals during repetitive action potentials. Regulation of voltage-gated CaV2.1 Ca(2+) channels by Ca(2+) sensor proteins induces facilitation of Ca(2+) currents and synaptic facilitation in cultured neurons expressing exogenous CaV2.1 channels. However, it is unknown whether this mechanism contributes to facilitation in native synapses. We introduced the IM-AA mutation into the IQ-like motif (IM) of the Ca(2+) sensor binding site. This mutation does not alter voltage dependence or kinetics of CaV2.1 currents, or frequency or amplitude of spontaneous miniature excitatory postsynaptic currents (mEPSCs); however, synaptic facilitation is completely blocked in excitatory glutamatergic synapses in hippocampal autaptic cultures. In acutely prepared hippocampal slices, frequency and amplitude of mEPSCs and amplitudes of evoked EPSCs are unaltered. In contrast, short-term synaptic facilitation in response to paired stimuli is reduced by ∼ 50%. In the presence of EGTA-AM to prevent global increases in free Ca(2+), the IM-AA mutation completely blocks short-term synaptic facilitation, indicating that synaptic facilitation by brief, local increases in Ca(2+) is dependent upon regulation of CaV2.1 channels by Ca(2+) sensor proteins. In response to trains of action potentials, synaptic facilitation is reduced in IM-AA synapses in initial stimuli, consistent with results of paired-pulse experiments; however, synaptic depression is also delayed, resulting in sustained increases in amplitudes of later EPSCs during trains of 10 stimuli at 10-20 Hz. Evidently, regulation of CaV2.1 channels by CaS proteins is required for normal short-term plasticity and normal encoding of information in native hippocampal synapses.
Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.
2013-01-01
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches. PMID:23695934
FPGA accelerator for protein secondary structure prediction based on the GOR algorithm
2011-01-01
Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582
Parameter optimization on the convergence surface of path simulations
NASA Astrophysics Data System (ADS)
Chandrasekaran, Srinivas Niranj
Computational treatments of protein conformational changes tend to focus on the trajectories themselves, despite the fact that it is the transition state structures that contain information about the barriers that impose multi-state behavior. PATH is an algorithm that computes a transition pathway between two protein crystal structures, along with the transition state structure, by minimizing the Onsager-Machlup action functional. It is rapid but depends on several unknown input parameters whose range of different values can potentially generate different transition-state structures. Transition-state structures arising from different input parameters cannot be uniquely compared with those generated by other methods. I outline modifications that I have made to the PATH algorithm that estimates these input parameters in a manner that circumvents these difficulties, and describe two complementary tests that validate the transition-state structures found by the PATH algorithm. First, I show that although the PATH algorithm and two other approaches to computing transition pathways produce different low-energy structures connecting the initial and final ground-states with the transition state, all three methods agree closely on the configurations of their transition states. Second, I show that the PATH transition states are close to the saddle points of free-energy surfaces connecting initial and final states generated by replica-exchange Discrete Molecular Dynamics simulations. I show that aromatic side-chain rearrangements create similar potential energy barriers in the transition-state structures identified by PATH for a signaling protein, a contractile protein, and an enzyme. Finally, I observed, but cannot account for, the fact that trajectories obtained for all-atom and Calpha-only simulations identify transition state structures in which the Calpha atoms are in essentially the same positions. The consistency between transition-state structures derived by different algorithms for unrelated protein systems argues that although functionally important protein conformational change trajectories are to a degree stochastic, they nonetheless pass through a well-defined transition state whose detailed structural properties can rapidly be identified using PATH. In the end, I outline the strategies that could enhance the efficiency and applicability of PATH.
A surface hopping algorithm for nonadiabatic minimum energy path calculations.
Schapiro, Igor; Roca-Sanjuán, Daniel; Lindh, Roland; Olivucci, Massimo
2015-02-15
The article introduces a robust algorithm for the computation of minimum energy paths transiting along regions of near-to or degeneracy of adiabatic states. The method facilitates studies of excited state reactivity involving weakly avoided crossings and conical intersections. Based on the analysis of the change in the multiconfigurational wave function the algorithm takes the decision whether the optimization should continue following the same electronic state or switch to a different state. This algorithm helps to overcome convergence difficulties near degeneracies. The implementation in the MOLCAS quantum chemistry package is discussed. To demonstrate the utility of the proposed procedure four examples of application are provided: thymine, asulam, 1,2-dioxetane, and a three-double-bond model of the 11-cis-retinal protonated Schiff base. © 2015 Wiley Periodicals, Inc.
Assessment of the information content of patterns: an algorithm
NASA Astrophysics Data System (ADS)
Daemi, M. Farhang; Beurle, R. L.
1991-12-01
A preliminary investigation confirmed the possibility of assessing the translational and rotational information content of simple artificial images. The calculation is tedious, and for more realistic patterns it is essential to implement the method on a computer. This paper describes an algorithm developed for this purpose which confirms the results of the preliminary investigation. Use of the algorithm facilitates much more comprehensive analysis of the combined effect of continuous rotation and fine translation, and paves the way for analysis of more realistic patterns. Owing to the volume of calculation involved in these algorithms, extensive computing facilities were necessary. The major part of the work was carried out using an ICL 3900 series mainframe computer as well as other powerful workstations such as a RISC architecture MIPS machine.
BiP clustering facilitates protein folding in the endoplasmic reticulum.
Griesemer, Marc; Young, Carissa; Robinson, Anne S; Petzold, Linda
2014-07-01
The chaperone BiP participates in several regulatory processes within the endoplasmic reticulum (ER): translocation, protein folding, and ER-associated degradation. To facilitate protein folding, a cooperative mechanism known as entropic pulling has been proposed to demonstrate the molecular-level understanding of how multiple BiP molecules bind to nascent and unfolded proteins. Recently, experimental evidence revealed the spatial heterogeneity of BiP within the nuclear and peripheral ER of S. cerevisiae (commonly referred to as 'clusters'). Here, we developed a model to evaluate the potential advantages of accounting for multiple BiP molecules binding to peptides, while proposing that BiP's spatial heterogeneity may enhance protein folding and maturation. Scenarios were simulated to gauge the effectiveness of binding multiple chaperone molecules to peptides. Using two metrics: folding efficiency and chaperone cost, we determined that the single binding site model achieves a higher efficiency than models characterized by multiple binding sites, in the absence of cooperativity. Due to entropic pulling, however, multiple chaperones perform in concert to facilitate the resolubilization and ultimate yield of folded proteins. As a result of cooperativity, multiple binding site models used fewer BiP molecules and maintained a higher folding efficiency than the single binding site model. These insilico investigations reveal that clusters of BiP molecules bound to unfolded proteins may enhance folding efficiency through cooperative action via entropic pulling.
On optima: the case of myoglobin-facilitated oxygen diffusion.
Wittenberg, Jonathan B
2007-08-15
The process of myoglobin/leghemoglobin-facilitated oxygen diffusion is adapted to function in different environments in diverse organisms. We enquire how the functional parameters of the process are optimized in particular organisms. The ligand-binding properties of the proteins, myoglobin and plant symbiotic hemoglobins, we discover, suggest that they have been adapted under genetic selection pressure for optimal performance. Since carrier-mediated oxygen transport has probably evolved independantly many times, adaptation of diverse proteins for a common functionality exemplifies the process of convergent evolution. The progenitor proteins may be built on the myoglobin scaffold or may be very different.
Platt, R D; Griggs, R A
1993-08-01
In four experiments with 760 subjects, the present study examined Cosmides' Darwinian algorithm theory of reasoning: specifically, its explanation of facilitation on the Wason selection task. The first experiment replicated Cosmides' finding of facilitation for social contract versions of the selection task, using both her multiple-problem format and a single-problem format. Experiment 2 examined performance on Cosmides' three main social contract problems while manipulating the perspective of the subject and the presence and absence of cost-benefit information. The presence of cost-benefit information improved performance in two of the three problems while the perspective manipulation had no effect. In Experiment 3, the cost-benefit effect was replicated; and performance on one of the three problems was enhanced by the presence of explicit negatives on the NOT-P and NOT-Q cards. Experiment 4 examined the role of the deontic term "must" in the facilitation observed for two of the social contract problems. The presence of "must" led to a significant improvement in performance. The results of these experiments are strongly supportive of social contract theory in that cost-benefit information is necessary for substantial facilitation to be observed in Cosmides' problems. These findings also suggest the presence of other cues that can help guide subjects to a deontic social contract interpretation when the social contract nature of the problem is not clear.
PSOVina: The hybrid particle swarm optimization algorithm for protein-ligand docking.
Ng, Marcus C K; Fong, Simon; Siu, Shirley W I
2015-06-01
Protein-ligand docking is an essential step in modern drug discovery process. The challenge here is to accurately predict and efficiently optimize the position and orientation of ligands in the binding pocket of a target protein. In this paper, we present a new method called PSOVina which combined the particle swarm optimization (PSO) algorithm with the efficient Broyden-Fletcher-Goldfarb-Shannon (BFGS) local search method adopted in AutoDock Vina to tackle the conformational search problem in docking. Using a diverse data set of 201 protein-ligand complexes from the PDBbind database and a full set of ligands and decoys for four representative targets from the directory of useful decoys (DUD) virtual screening data set, we assessed the docking performance of PSOVina in comparison to the original Vina program. Our results showed that PSOVina achieves a remarkable execution time reduction of 51-60% without compromising the prediction accuracies in the docking and virtual screening experiments. This improvement in time efficiency makes PSOVina a better choice of a docking tool in large-scale protein-ligand docking applications. Our work lays the foundation for the future development of swarm-based algorithms in molecular docking programs. PSOVina is freely available to non-commercial users at http://cbbio.cis.umac.mo .
Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei
2016-01-01
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures
Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad
2015-01-01
We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411
Playing biology's name game: identifying protein names in scientific text.
Hanisch, Daniel; Fluck, Juliane; Mevissen, Heinz-Theodor; Zimmer, Ralf
2003-01-01
A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.
2013-01-01
Background The Saccharomyces cerevisiae 14-spanner Drug:H+ Antiporter family 2 (DHA2) are transporters of the Major Facilitator Superfamily (MFS) involved in multidrug resistance (MDR). Although poorly characterized, DHA2 family members were found to participate in the export of structurally and functionally unrelated compounds or in the uptake of amino acids into the vacuole or the cell. In S. cerevisiae, the four ARN/SIT family members encode siderophore transporters and the two GEX family members encode glutathione extrusion pumps. The evolutionary history of DHA2, ARN and GEX genes, encoding 14-spanner MFS transporters, is reconstructed in this study. Results The translated ORFs of 31 strains from 25 hemiascomycetous species, including 10 pathogenic Candida species, were compared using a local sequence similarity algorithm. The constraining and traversing of a network representing the pairwise similarity data gathered 355 full size proteins and retrieved ARN and GEX family members together with DHA2 transporters, suggesting the existence of a close phylogenetic relationship among these 14-spanner major facilitators. Gene neighbourhood analysis was combined with tree construction methodologies to reconstruct their evolutionary history and 7 DHA2 gene lineages, 5 ARN gene lineages, and 1 GEX gene lineage, were identified. The S. cerevisiae DHA2 proteins Sge1, Azr1, Vba3 and Vba5 co-clustered in a large phylogenetic branch, the ATR1 and YMR279C genes were proposed to be paralogs formed during the Whole Genome Duplication (WGD) whereas the closely related ORF YOR378W resides in its own lineage. Homologs of S. cerevisiae DHA2 vacuolar proteins Vba1, Vba2 and Vba4 occur widespread in the Hemiascomycetes. Arn1/Arn2 homologs were only found in species belonging to the Saccharomyces complex and are more abundant in the pre-WGD species. Arn4 homologs were only found in sub-telomeric regions of species belonging to the Sacharomyces sensu strictu group (SSSG). Arn3 type siderophore transporters are abundant in the Hemiascomycetes and form an ancient gene lineage extending to the filamentous fungi. Conclusions The evolutionary history of DHA2, ARN and GEX genes was reconstructed and a common evolutionary root shared by the encoded proteins is hypothesized. A new protein family, denominated DAG, is proposed to span these three phylogenetic subfamilies of 14-spanner MFS transporters. PMID:24345006
Feng, Yong-E
2016-06-01
Malaria parasite secretes various proteins in infected red blood cell for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine or drug against malaria. In this study, the modified method of quadratic discriminant analysis is presented for predicting the secretory proteins. Firstly, 20 amino acids are divided into five types according to the physical and chemical characteristics of amino acids. Then, we used five types of amino acids compositions as inputs of the modified quadratic discriminant algorithm. Finally, the best prediction performance is obtained by using 20 amino acid compositions, the sensitivity of 96 %, the specificity of 92 % with 0.88 of Mathew's correlation coefficient in fivefold cross-validation test. The results are also compared with those of existing prediction methods. The compared results shown our method are prominent in the prediction of secretory proteins.
Second-order Poisson Nernst-Planck solver for ion channel transport
Zheng, Qiong; Chen, Duan; Wei, Guo-Wei
2010-01-01
The Poisson Nernst-Planck (PNP) theory is a simplified continuum model for a wide variety of chemical, physical and biological applications. Its ability of providing quantitative explanation and increasingly qualitative predictions of experimental measurements has earned itself much recognition in the research community. Numerous computational algorithms have been constructed for the solution of the PNP equations. However, in the realistic ion-channel context, no second order convergent PNP algorithm has ever been reported in the literature, due to many numerical obstacles, including discontinuous coefficients, singular charges, geometric singularities, and nonlinear couplings. The present work introduces a number of numerical algorithms to overcome the abovementioned numerical challenges and constructs the first second-order convergent PNP solver in the ion-channel context. First, a Dirichlet to Neumann mapping (DNM) algorithm is designed to alleviate the charge singularity due to the protein structure. Additionally, the matched interface and boundary (MIB) method is reformulated for solving the PNP equations. The MIB method systematically enforces the interface jump conditions and achieves the second order accuracy in the presence of complex geometry and geometric singularities of molecular surfaces. Moreover, two iterative schemes are utilized to deal with the coupled nonlinear equations. Furthermore, extensive and rigorous numerical validations are carried out over a number of geometries, including a sphere, two proteins and an ion channel, to examine the numerical accuracy and convergence order of the present numerical algorithms. Finally, application is considered to a real transmembrane protein, the Gramicidin A channel protein. The performance of the proposed numerical techniques is tested against a number of factors, including mesh sizes, diffusion coefficient profiles, iterative schemes, ion concentrations, and applied voltages. Numerical predictions are compared with experimental measurements. PMID:21552336
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
Signal Partitioning Algorithm for Highly Efficient Gaussian Mixture Modeling in Mass Spectrometry
Polanski, Andrzej; Marczyk, Michal; Pietrowska, Monika; Widlak, Piotr; Polanska, Joanna
2015-01-01
Mixture - modeling of mass spectra is an approach with many potential applications including peak detection and quantification, smoothing, de-noising, feature extraction and spectral signal compression. However, existing algorithms do not allow for automated analyses of whole spectra. Therefore, despite highlighting potential advantages of mixture modeling of mass spectra of peptide/protein mixtures and some preliminary results presented in several papers, the mixture modeling approach was so far not developed to the stage enabling systematic comparisons with existing software packages for proteomic mass spectra analyses. In this paper we present an efficient algorithm for Gaussian mixture modeling of proteomic mass spectra of different types (e.g., MALDI-ToF profiling, MALDI-IMS). The main idea is automated partitioning of protein mass spectral signal into fragments. The obtained fragments are separately decomposed into Gaussian mixture models. The parameters of the mixture models of fragments are then aggregated to form the mixture model of the whole spectrum. We compare the elaborated algorithm to existing algorithms for peak detection and we demonstrate improvements of peak detection efficiency obtained by using Gaussian mixture modeling. We also show applications of the elaborated algorithm to real proteomic datasets of low and high resolution. PMID:26230717
Identification of structural domains in proteins by a graph heuristic.
Wernisch, L; Hunting, M; Wodak, S J
1999-05-15
A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9:256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.
Inductive reasoning and implicit memory: evidence from intact and impaired memory systems.
Girelli, Luisa; Semenza, Carlo; Delazer, Margarete
2004-01-01
In this study, we modified a classic problem solving task, number series completion, in order to explore the contribution of implicit memory to inductive reasoning. Participants were required to complete number series sharing the same underlying algorithm (e.g., +2), differing in both constituent elements (e.g., 2468 versus 57911) and correct answers (e.g., 10 versus 13). In Experiment 1, reliable priming effects emerged, whether primes and targets were separated by four or ten fillers. Experiment 2 provided direct evidence that the observed facilitation arises at central stages of problem solving, namely the identification of the algorithm and its subsequent extrapolation. The observation of analogous priming effects in a severely amnesic patient strongly supports the hypothesis that the facilitation in number series completion was largely determined by implicit memory processes. These findings demonstrate that the influence of implicit processes extends to higher level cognitive domain such as induction reasoning.
Direct observation of TALE protein dynamics reveals a two-state search mechanism
Cuculis, Luke; Abil, Zhanar; Zhao, Huimin; Schroeder, Charles M.
2015-01-01
Transcription activator-like effector (TALE) proteins are a class of programmable DNA-binding proteins for which the fundamental mechanisms governing the search process are not fully understood. Here we use single-molecule techniques to directly observe TALE search dynamics along DNA templates. We find that TALE proteins are capable of rapid diffusion along DNA using a combination of sliding and hopping behaviour, which suggests that the TALE search process is governed in part by facilitated diffusion. We also observe that TALE proteins exhibit two distinct modes of action during the search process—a search state and a recognition state—facilitated by different subdomains in monomeric TALE proteins. Using TALE truncation mutants, we further demonstrate that the N-terminal region of TALEs is required for the initial non-specific binding and subsequent rapid search along DNA, whereas the central repeat domain is required for transitioning into the site-specific recognition state. PMID:26027871
Direct observation of TALE protein dynamics reveals a two-state search mechanism.
Cuculis, Luke; Abil, Zhanar; Zhao, Huimin; Schroeder, Charles M
2015-06-01
Transcription activator-like effector (TALE) proteins are a class of programmable DNA-binding proteins for which the fundamental mechanisms governing the search process are not fully understood. Here we use single-molecule techniques to directly observe TALE search dynamics along DNA templates. We find that TALE proteins are capable of rapid diffusion along DNA using a combination of sliding and hopping behaviour, which suggests that the TALE search process is governed in part by facilitated diffusion. We also observe that TALE proteins exhibit two distinct modes of action during the search process-a search state and a recognition state-facilitated by different subdomains in monomeric TALE proteins. Using TALE truncation mutants, we further demonstrate that the N-terminal region of TALEs is required for the initial non-specific binding and subsequent rapid search along DNA, whereas the central repeat domain is required for transitioning into the site-specific recognition state.
NASA Technical Reports Server (NTRS)
Ramachandran, Ganesh K.; Akopian, David; Heckler, Gregory W.; Winternitz, Luke B.
2011-01-01
Location technologies have many applications in wireless communications, military and space missions, etc. US Global Positioning System (GPS) and other existing and emerging Global Navigation Satellite Systems (GNSS) are expected to provide accurate location information to enable such applications. While GNSS systems perform very well in strong signal conditions, their operation in many urban, indoor, and space applications is not robust or even impossible due to weak signals and strong distortions. The search for less costly, faster and more sensitive receivers is still in progress. As the research community addresses more and more complicated phenomena there exists a demand on flexible multimode reference receivers, associated SDKs, and development platforms which may accelerate and facilitate the research. One of such concepts is the software GPS/GNSS receiver (GPS SDR) which permits a facilitated access to algorithmic libraries and a possibility to integrate more advanced algorithms without hardware and essential software updates. The GNU-SDR and GPS-SDR open source receiver platforms are such popular examples. This paper evaluates the performance of recently proposed block-corelator techniques for acquisition and tracking of GPS signals using open source GPS-SDR platform.
Sridharan, Gautham Vivek; D'Alessandro, Matthew; Bale, Shyam Sundhar; Bhagat, Vicky; Gagnon, Hugo; Asara, John M; Uygun, Korkut; Yarmush, Martin L; Saeidi, Nima
2017-09-01
Morbidly obese patients often elect for Roux-en-Y gastric bypass (RYGB), a form of bariatric surgery that triggers a remarkable 30% reduction in excess body weight and reversal of insulin resistance for those who are type II diabetic. A more complete understanding of the underlying molecular mechanisms that drive the complex metabolic reprogramming post-RYGB could lead to innovative non-invasive therapeutics that mimic the beneficial effects of the surgery, namely weight loss, achievement of glycemic control, or reversal of non-alcoholic steatohepatitis (NASH). To facilitate these discoveries, we hereby demonstrate the first multi-omic interrogation of a rodent RYGB model to reveal tissue-specific pathway modules implicated in the control of body weight regulation and energy homeostasis. In this study, we focus on and evaluate liver metabolism three months following RYGB in rats using both SWATH proteomics, a burgeoning label free approach using high resolution mass spectrometry to quantify protein levels in biological samples, as well as MRM metabolomics. The SWATH analysis enabled the quantification of 1378 proteins in liver tissue extracts, of which we report the significant down-regulation of Thrsp and Acot13 in RYGB as putative targets of lipid metabolism for weight loss. Furthermore, we develop a computational graph-based metabolic network module detection algorithm for the discovery of non-canonical pathways, or sub-networks, enriched with significantly elevated or depleted metabolites and proteins in RYGB-treated rat livers. The analysis revealed a network connection between the depleted protein Baat and the depleted metabolite taurine, corroborating the clinical observation that taurine-conjugated bile acid levels are perturbed post-RYGB.
Sparbier, Katrin; Asperger, Arndt; Resemann, Anja; Kessler, Irina; Koch, Sonja; Wenzel, Thomas; Stein, Günter; Vorwerg, Lars; Suckau, Detlev; Kostrzewa, Markus
2007-01-01
Comprehensive proteomic analyses require efficient and selective pre-fractionation to facilitate analysis of post-translationally modified peptides and proteins, and automated analysis workflows enabling the detection, identification, and structural characterization of the corresponding peptide modifications. Human serum contains a high number of glycoproteins, comprising several orders of magnitude in concentration. Thereby, isolation and subsequent identification of low-abundant glycoproteins from serum is a challenging task. selective capturing of glycopeptides and -proteins was attained by means of magnetic particles specifically functionalized with lectins or boronic acids that bind to various structural motifs. Human serum was incubated with differentially functionalized magnetic micro-particles (lectins or boronic acids), and isolated proteins were digested with trypsin. Subsequently, the resulting complex mixture of peptides and glycopeptides was subjected to LC-MALDI analysis and database searching. In parallel, a second magnetic bead capturing was performed on the peptide level to separate and analyze by LC-MALDI intact glycopeptides, both peptide sequence and glycan structure. Detection of glycopeptides was achieved by means of a software algorithm that allows extraction and characterization of potential glycopeptide candidates from large LC-MALDI-MS/MS data sets, based on N-glycopeptide-specific fragmentation patterns and characteristic fragment mass peaks, respectively. By means of fast and simple glycospecific capturing applied in conjunction with extensive LC-MALDI-MS/MS analysis and novel data analysis tools, a high number of low-abundant proteins were identified, comprising known or predicted glycosylation sites. According to the specific binding preferences of the different types of beads, complementary results were obtained from the experiments using either magnetic ConA-, LCA-, WGA-, and boronic acid beads, respectively. PMID:17916798
Amino acid signature enables proteins to recognize modified tRNA.
Spears, Jessica L; Xiao, Xingqing; Hall, Carol K; Agris, Paul F
2014-02-25
Human tRNA(Lys3)UUU is the primer for HIV replication. The HIV-1 nucleocapsid protein, NCp7, facilitates htRNA(Lys3)UUU recruitment from the host cell by binding to and remodeling the tRNA structure. Human tRNA(Lys3)UUU is post-transcriptionally modified, but until recently, the importance of those modifications in tRNA recognition by NCp7 was unknown. Modifications such as the 5-methoxycarbonylmethyl-2-thiouridine at anticodon wobble position-34 and 2-methylthio-N(6)-threonylcarbamoyladenosine, adjacent to the anticodon at position-37, are important to the recognition of htRNA(Lys3)UUU by NCp7. Several short peptides selected from phage display libraries were found to also preferentially recognize these modifications. Evolutionary algorithms (Monte Carlo and self-consistent mean field) and assisted model building with energy refinement were used to optimize the peptide sequence in silico, while fluorescence assays were developed and conducted to verify the in silico results and elucidate a 15-amino acid signature sequence (R-W-Q/N-H-X2-F-Pho-X-G/A-W-R-X2-G, where X can be most amino acids, and Pho is hydrophobic) that recognized the tRNA's fully modified anticodon stem and loop domain, hASL(Lys3)UUU. Peptides of this sequence specifically recognized and bound modified htRNA(Lys3)UUU with an affinity 10-fold higher than that of the starting sequence. Thus, this approach provides an effective means of predicting sequences of RNA binding peptides that have better binding properties. Such peptides can be used in cell and molecular biology as well as biochemistry to explore RNA binding proteins and to inhibit those protein functions.
Discovering semantic features in the literature: a foundation for building functional associations
Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto
2006-01-01
Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716
Evaluation of Laser Based Alignment Algorithms Under Additive Random and Diffraction Noise
DOE Office of Scientific and Technical Information (OSTI.GOV)
McClay, W A; Awwal, A; Wilhelmsen, K
2004-09-30
The purpose of the automatic alignment algorithm at the National Ignition Facility (NIF) is to determine the position of a laser beam based on the position of beam features from video images. The position information obtained is used to command motors and attenuators to adjust the beam lines to the desired position, which facilitates the alignment of all 192 beams. One of the goals of the algorithm development effort is to ascertain the performance, reliability, and uncertainty of the position measurement. This paper describes a method of evaluating the performance of algorithms using Monte Carlo simulation. In particular we showmore » the application of this technique to the LM1{_}LM3 algorithm, which determines the position of a series of two beam light sources. The performance of the algorithm was evaluated for an ensemble of over 900 simulated images with varying image intensities and noise counts, as well as varying diffraction noise amplitude and frequency. The performance of the algorithm on the image data set had a tolerance well beneath the 0.5-pixel system requirement.« less
Gao, Yingbin; Kong, Xiangyu; Zhang, Huihui; Hou, Li'an
2017-05-01
Minor component (MC) plays an important role in signal processing and data analysis, so it is a valuable work to develop MC extraction algorithms. Based on the concepts of weighted subspace and optimum theory, a weighted information criterion is proposed for searching the optimum solution of a linear neural network. This information criterion exhibits a unique global minimum attained if and only if the state matrix is composed of the desired MCs of an autocorrelation matrix of an input signal. By using gradient ascent method and recursive least square (RLS) method, two algorithms are developed for multiple MCs extraction. The global convergences of the proposed algorithms are also analyzed by the Lyapunov method. The proposed algorithms can extract the multiple MCs in parallel and has advantage in dealing with high dimension matrices. Since the weighted matrix does not require an accurate value, it facilitates the system design of the proposed algorithms for practical applications. The speed and computation advantages of the proposed algorithms are verified through simulations. Copyright © 2017 Elsevier Ltd. All rights reserved.
Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis.
Wei, Qinglai; Lewis, Frank L; Sun, Qiuye; Yan, Pengfei; Song, Ruizhuo
2017-05-01
In this paper, a novel discrete-time deterministic Q -learning algorithm is developed. In each iteration of the developed Q -learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q -learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
Proton-coupled sugar transport in the prototypical major facilitator superfamily protein XylE
Wisedchaisri, Goragot; Park, Min-Sun; Iadanza, Matthew G.; Zheng, Hongjin; Gonen, Tamir
2014-01-01
The major facilitator superfamily (MFS) is the largest collection of structurally related membrane proteins that transport a wide array of substrates. The proton-coupled sugar transporter XylE is the first member of the MFS that has been structurally characterized in multiple transporting conformations, including both the outward and inward-facing states. Here we report the crystal structure of XylE in a new inward-facing open conformation, allowing us to visualize the rocker-switch movement of the N-domain against the C-domain during the transport cycle. Using molecular dynamics simulation, and functional transport assays, we describe the movement of XylE that facilitates sugar translocation across a lipid membrane and identify the likely candidate proton-coupling residues as the conserved Asp27 and Arg133. This study addresses the structural basis for proton-coupled substrate transport and release mechanism for the sugar porter family of proteins. PMID:25088546
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.
Kaur, Harpreet; Raghava, G P S
2002-03-01
beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.
Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi
2012-12-14
A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sim, Jaehyun; Sim, Jun; Park, Eunsung; Lee, Julian
2015-06-01
Many proteins undergo large-scale motions where relatively rigid domains move against each other. The identification of rigid domains, as well as the hinge residues important for their relative movements, is important for various applications including flexible docking simulations. In this work, we develop a method for protein rigid domain identification based on an exhaustive enumeration of maximal rigid domains, the rigid domains not fully contained within other domains. The computation is performed by mapping the problem to that of finding maximal cliques in a graph. A minimal set of rigid domains are then selected, which cover most of the protein with minimal overlap. In contrast to the results of existing methods that partition a protein into non-overlapping domains using approximate algorithms, the rigid domains obtained from exact enumeration naturally contain overlapping regions, which correspond to the hinges of the inter-domain bending motion. The performance of the algorithm is demonstrated on several proteins. © 2015 Wiley Periodicals, Inc.
Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.
Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou
2017-05-17
Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.
A Prize-Collecting Steiner Tree Approach for Transduction Network Inference
NASA Astrophysics Data System (ADS)
Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo
Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.
Fiji: an open-source platform for biological-image analysis.
Schindelin, Johannes; Arganda-Carreras, Ignacio; Frise, Erwin; Kaynig, Verena; Longair, Mark; Pietzsch, Tobias; Preibisch, Stephan; Rueden, Curtis; Saalfeld, Stephan; Schmid, Benjamin; Tinevez, Jean-Yves; White, Daniel James; Hartenstein, Volker; Eliceiri, Kevin; Tomancak, Pavel; Cardona, Albert
2012-06-28
Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.
Safe Maneuvering Envelope Estimation Based on a Physical Approach
NASA Technical Reports Server (NTRS)
Lombaerts, Thomas J. J.; Schuet, Stefan R.; Wheeler, Kevin R.; Acosta, Diana; Kaneshige, John T.
2013-01-01
This paper discusses a computationally efficient algorithm for estimating the safe maneuvering envelope of damaged aircraft. The algorithm performs a robust reachability analysis through an optimal control formulation while making use of time scale separation and taking into account uncertainties in the aerodynamic derivatives. This approach differs from others since it is physically inspired. This more transparent approach allows interpreting data in each step, and it is assumed that these physical models based upon flight dynamics theory will therefore facilitate certification for future real life applications.
Mathematical filtering minimizes metallic halation of titanium implants in MicroCT images.
Ha, Jee; Osher, Stanley J; Nishimura, Ichiro
2013-01-01
Microcomputed tomography (MicroCT) images containing titanium implant suffer from x-rays scattering, artifact and the implant surface is critically affected by metallic halation. To improve the metallic halation artifact, a nonlinear Total Variation denoising algorithm such as Split Bregman algorithm was applied to the digital data set of MicroCT images. This study demonstrated that the use of a mathematical filter could successfully reduce metallic halation, facilitating the osseointegration evaluation at the bone implant interface in the reconstructed images.
Simulation of a navigator algorithm for a low-cost GPS receiver
NASA Technical Reports Server (NTRS)
Hodge, W. F.
1980-01-01
The analytical structure of an existing navigator algorithm for a low cost global positioning system receiver is described in detail to facilitate its implementation on in-house digital computers and real-time simulators. The material presented includes a simulation of GPS pseudorange measurements, based on a two-body representation of the NAVSTAR spacecraft orbits, and a four component model of the receiver bias errors. A simpler test for loss of pseudorange measurements due to spacecraft shielding is also noted.
Symmetry dependence of holograms for optical trapping
NASA Astrophysics Data System (ADS)
Curtis, Jennifer E.; Schmitz, Christian H. J.; Spatz, Joachim P.
2005-08-01
No iterative algorithm is necessary to calculate holograms for most holographic optical trapping patterns. Instead, holograms may be produced by a simple extension of the prisms-and-lenses method. This formulaic approach yields the same diffraction efficiency as iterative algorithms for any asymmetric or symmetric but nonperiodic pattern of points while requiring less calculation time. A slight spatial disordering of periodic patterns significantly reduces intensity variations between the different traps without extra calculation costs. Eliminating laborious hologram calculations should greatly facilitate interactive holographic trapping.
Intelligent Visual Input: A Graphical Method for Rapid Entry of Patient-Specific Data
Bergeron, Bryan P.; Greenes, Robert A.
1987-01-01
Intelligent Visual Input (IVI) provides a rapid, graphical method of data entry for both expert system interaction and medical record keeping purposes. Key components of IVI include: a high-resolution graphic display; an interface supportive of rapid selection, i.e., one utilizing a mouse or light pen; algorithm simplification modules; and intelligent graphic algorithm expansion modules. A prototype IVI system, designed to facilitate entry of physical exam findings, is used to illustrates the potential advantages of this approach.
NASA Technical Reports Server (NTRS)
Kitzis, J. L.; Kitzis, S. N.
1979-01-01
An evaluation of the versions of the SEASAT-A SMMR antenna pattern correction (APC) algorithm is presented. Two efforts are focused upon in the APC evaluation: the intercomparison of the interim, box, cross, and nominal APC modes; and the development of software to facilitate the creation of matched spacecraft and surface truth data sets which are located together in time and space. The problems discovered in earlier versions of the APC, now corrected, are discussed.
A computer program for the localization of small areas in roentgenological images
NASA Technical Reports Server (NTRS)
Keller, R. A.; Baily, N. A.
1976-01-01
A method and associated algorithm are presented which allow a simple and accurate determination to be made of the location of small symmetric areas presented in roentgenological images. The method utilizes an operator to visually spot object positions but eliminates the need for critical positioning accuracy on the operator's part. The rapidity of measurement allows results to be evaluated on-line. Parameters associated with the algorithm have been analyzed, and methods to facilitate an optimum choice for any particular experimental setup are presented.
Poulin, Patrick; Burczynski, Frank J; Haddad, Sami
2016-02-01
A critical component in the development of physiologically based pharmacokinetic-pharmacodynamic (PBPK/PD) models for estimating target organ dosimetry in pharmacology and toxicology studies is the understanding of the uptake kinetics and accumulation of drugs and chemicals at the cellular level. Therefore, predicting free drug concentrations in intracellular fluid will contribute to our understanding of concentrations at the site of action in cells in PBPK/PD research. Some investigators believe that uptake of drugs in cells is solely driven by the unbound fraction; conversely, others argue that the protein-bound fraction contributes a significant portion of the total amount delivered to cells. Accordingly, the current literature suggests the existence of a so-called albumin-mediated uptake mechanism(s) for the protein-bound fraction (i.e., extracellular protein-facilitated uptake mechanisms) at least in hepatocytes and cardiac myocytes; however, such mechanism(s) and cells from other organs deserve further exploration. Therefore, the main objective of this present study was to discuss further the implication of potential protein-facilitated uptake mechanism(s) on drug distribution in cells under in vivo conditions. The interplay between the protein-facilitated uptake mechanism(s) and the effects of a pH gradient, metabolism, transport, and permeation limitation potentially occurring in cells was also discussed, as this should violate the basic assumption on similar free drug concentration in cells and plasma. This was made because the published equations used to calculate drug concentrations in cells in a PBPK/PD model did not consider potential protein-facilitated uptake mechanism(s). Consequently, we corrected some published equations for calculating the free drug concentrations in cells compared with plasma in PBPK/PD modeling studies, and we proposed a refined strategy for potentially performing more accurate quantitative in vitro-to-in vivo extrapolations (IVIVEs) of toxicity (efficacy) at the cellular level from data generated in cell assays. Overall, this present study may help to optimize the human dose prediction in preclinical and clinical studies, while prescribing drugs with narrow therapeutic windows that are highly bound to extracellular proteins and/or highly ionized at the physiological pH. This may facilitate building a more accurate safety (efficacy) profile for such drugs. Copyright © 2016 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
AUTOBA: automation of backbone assignment from HN(C)N suite of experiments.
Borkar, Aditi; Kumar, Dinesh; Hosur, Ramakrishna V
2011-07-01
Development of efficient strategies and automation represent important milestones of progress in rapid structure determination efforts in proteomics research. In this context, we present here an efficient algorithm named as AUTOBA (Automatic Backbone Assignment) designed to automate the assignment protocol based on HN(C)N suite of experiments. Depending upon the spectral dispersion, the user can record 2D or 3D versions of the experiments for assignment. The algorithm uses as inputs: (i) protein primary sequence and (ii) peak-lists from user defined HN(C)N suite of experiments. In the end, one gets H(N), (15)N, C(α) and C' assignments (in common BMRB format) for the individual residues along the polypeptide chain. The success of the algorithm has been demonstrated, not only with experimental spectra recorded on two small globular proteins: ubiquitin (76 aa) and M-crystallin (85 aa), but also with simulated spectra of 27 other proteins using assignment data from the BMRB.
Scholl, Zackary N.; Marszalek, Piotr E.
2013-01-01
The benefits of single molecule force spectroscopy (SMFS) clearly outweigh the challenges which include small sample sizes, tedious data collection and introduction of human bias during the subjective data selection. These difficulties can be partially eliminated through automation of the experimental data collection process for atomic force microscopy (AFM). Automation can be accomplished using an algorithm that triages usable force-extension recordings quickly with positive and negative selection. We implemented an algorithm based on the windowed fast Fourier transform of force-extension traces that identifies peaks using force-extension regimes to correctly identify usable recordings from proteins composed of repeated domains. This algorithm excels as a real-time diagnostic because it involves <30 ms computational time, has high sensitivity and specificity, and efficiently detects weak unfolding events. We used the statistics provided by the automated procedure to clearly demonstrate the properties of molecular adhesion and how these properties change with differences in the cantilever tip and protein functional groups and protein age. PMID:24001740
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning
NASA Astrophysics Data System (ADS)
Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao
2017-04-01
Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning.
Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao
2017-04-18
Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Small Scaffolds, Big Potential: Developing Miniature Proteins as Therapeutic Agents.
Holub, Justin M
2017-09-01
Preclinical Research Miniature proteins are a class of oligopeptide characterized by their short sequence lengths and ability to adopt well-folded, three-dimensional structures. Because of their biomimetic nature and synthetic tractability, miniature proteins have been used to study a range of biochemical processes including fast protein folding, signal transduction, catalysis and molecular transport. Recently, miniature proteins have been gaining traction as potential therapeutic agents because their small size and ability to fold into defined tertiary structures facilitates their development as protein-based drugs. This research overview discusses emerging developments involving the use of miniature proteins as scaffolds to design novel therapeutics for the treatment and study of human disease. Specifically, this review will explore strategies to: (i) stabilize miniature protein tertiary structure; (ii) optimize biomolecular recognition by grafting functional epitopes onto miniature protein scaffolds; and (iii) enhance cytosolic delivery of miniature proteins through the use of cationic motifs that facilitate endosomal escape. These objectives are discussed not only to address challenges in developing effective miniature protein-based drugs, but also to highlight the tremendous potential miniature proteins hold for combating and understanding human disease. Drug Dev Res 78 : 268-282, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Nopaline-type Ti plasmid of Agrobacterium encodes a VirF-like functional F-box protein.
Lacroix, Benoît; Citovsky, Vitaly
2015-11-20
During Agrobacterium-mediated genetic transformation of plants, several bacterial virulence (Vir) proteins are translocated into the host cell to facilitate infection. One of the most important of such translocated factors is VirF, an F-box protein produced by octopine strains of Agrobacterium, which presumably facilitates proteasomal uncoating of the invading T-DNA from its associated proteins. The presence of VirF also is thought to be involved in differences in host specificity between octopine and nopaline strains of Agrobacterium, with the current dogma being that no functional VirF is encoded by nopaline strains. Here, we show that a protein with homology to octopine VirF is encoded by the Ti plasmid of the nopaline C58 strain of Agrobacterium. This protein, C58VirF, possesses the hallmarks of functional F-box proteins: it contains an active F-box domain and specifically interacts, via its F-box domain, with SKP1-like (ASK) protein components of the plant ubiquitin/proteasome system. Thus, our data suggest that nopaline strains of Agrobacterium have evolved to encode a functional F-box protein VirF.
Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria
Farasat, Iman; Kushwaha, Manish; Collens, Jason; Easterbrook, Michael; Guido, Matthew; Salis, Howard M
2014-01-01
Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs. PMID:24952589
Expediting topology data gathering for the TOPDB database.
Dobson, László; Langó, Tamás; Reményi, István; Tusnády, Gábor E
2015-01-01
The Topology Data Bank of Transmembrane Proteins (TOPDB, http://topdb.enzim.ttk.mta.hu) contains experimentally determined topology data of transmembrane proteins. Recently, we have updated TOPDB from several sources and utilized a newly developed topology prediction algorithm to determine the most reliable topology using the results of experiments as constraints. In addition to collecting the experimentally determined topology data published in the last couple of years, we gathered topographies defined by the TMDET algorithm using 3D structures from the PDBTM. Results of global topology analysis of various organisms as well as topology data generated by high throughput techniques, like the sequential positions of N- or O-glycosylations were incorporated into the TOPDB database. Moreover, a new algorithm was developed to integrate scattered topology data from various publicly available databases and a new method was introduced to measure the reliability of predicted topologies. We show that reliability values highly correlate with the per protein topology accuracy of the utilized prediction method. Altogether, more than 52,000 new topology data and more than 2600 new transmembrane proteins have been collected since the last public release of the TOPDB database. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mechanistic design data from ODOT instrumented pavement sites : phase II report.
DOT National Transportation Integrated Search
2017-03-01
This investigation examined data obtained from three previously-instrumented pavement test sites in Oregon. Data processing algorithms and templates were developed for each test site that facilitated full processing of all the data to build databases...
Mechanistic design data from ODOT instrumented pavement sites : phase 1 report.
DOT National Transportation Integrated Search
2017-03-01
This investigation examined data obtained from three previously-instrumented pavement test sites in Oregon. Data processing algorithms and templates were developed for each test site that facilitated full processing of all the data to build databases...
Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P
2007-03-15
Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.
Glick, Meir; Rayan, Anwar; Goldblum, Amiram
2002-01-01
The problem of global optimization is pivotal in a variety of scientific fields. Here, we present a robust stochastic search method that is able to find the global minimum for a given cost function, as well as, in most cases, any number of best solutions for very large combinatorial “explosive” systems. The algorithm iteratively eliminates variable values that contribute consistently to the highest end of a cost function's spectrum of values for the full system. Values that have not been eliminated are retained for a full, exhaustive search, allowing the creation of an ordered population of best solutions, which includes the global minimum. We demonstrate the ability of the algorithm to explore the conformational space of side chains in eight proteins, with 54 to 263 residues, to reproduce a population of their low energy conformations. The 1,000 lowest energy solutions are identical in the stochastic (with two different seed numbers) and full, exhaustive searches for six of eight proteins. The others retain the lowest 141 and 213 (of 1,000) conformations, depending on the seed number, and the maximal difference between stochastic and exhaustive is only about 0.15 Kcal/mol. The energy gap between the lowest and highest of the 1,000 low-energy conformers in eight proteins is between 0.55 and 3.64 Kcal/mol. This algorithm offers real opportunities for solving problems of high complexity in structural biology and in other fields of science and technology. PMID:11792838
MINE: Module Identification in Networks
2011-01-01
Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
NASA Astrophysics Data System (ADS)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei
2014-06-01
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N^2). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei, E-mail: wei@math.msu.edu
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions,more » while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N{sup 2}). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.« less
Falkner, Jayson; Andrews, Philip
2005-05-15
Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
An algorithm for converting a virtual-bond chain into a complete polypeptide backbone chain
NASA Technical Reports Server (NTRS)
Luo, N.; Shibata, M.; Rein, R.
1991-01-01
A systematic analysis is presented of the algorithm for converting a virtual-bond chain, defined by the coordinates of the alpha-carbons of a given protein, into a complete polypeptide backbone. An alternative algorithm, based upon the same set of geometric parameters used in the Purisima-Scheraga algorithm but with a different "linkage map" of the algorithmic procedures, is proposed. The global virtual-bond chain geometric constraints are more easily separable from the loal peptide geometric and energetic constraints derived from, for example, the Ramachandran criterion, within the framework of this approach.
Narayanan, Shrikanth
2009-01-01
We describe a method for unsupervised region segmentation of an image using its spatial frequency domain representation. The algorithm was designed to process large sequences of real-time magnetic resonance (MR) images containing the 2-D midsagittal view of a human vocal tract airway. The segmentation algorithm uses an anatomically informed object model, whose fit to the observed image data is hierarchically optimized using a gradient descent procedure. The goal of the algorithm is to automatically extract the time-varying vocal tract outline and the position of the articulators to facilitate the study of the shaping of the vocal tract during speech production. PMID:19244005
Huang, Sheng Yu; Chen, Sung Fang; Chen, Chun Hao; Huang, Hsuan Wei; Wu, Wen Guey; Sung, Wang Chou
2014-09-02
Snake venom consists of toxin proteins with multiple disulfide linkages to generate unique structures and biological functions. Determination of these cysteine connections usually requires the purification of each protein followed by structural analysis. In this study, dimethyl labeling coupled with LC-MS/MS and RADAR algorithm was developed to identify the disulfide bonds in crude snake venom. Without any protein separation, the disulfide linkages of several cytotoxins and PLA2 could be solved, including more than 20 disulfide bonds. The results show that this method is capable of analyzing protein mixture. In addition, the approach was also used to compare native cytotoxin 3 (CTX III) and its scrambled isomer, another category of protein mixture, for unknown disulfide bonds. Two disulfide-linked peptides were observed in the native CTX III, and 10 in its scrambled form, X-CTX III. This is the first study that reports a platform for the global cysteine connection analysis on a protein mixture. The proposed method is simple and automatic, offering an efficient tool for structural and functional studies of venom proteins.
Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen
2018-04-01
Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.
The BEACH-containing protein WDR81 coordinates p62 and LC3C to promote aggrephagy.
Liu, Xuezhao; Li, Yang; Wang, Xin; Xing, Ruxiao; Liu, Kai; Gan, Qiwen; Tang, Changyong; Gao, Zhiyang; Jian, Youli; Luo, Shouqing; Guo, Weixiang; Yang, Chonglin
2017-05-01
Autophagy-dependent clearance of ubiquitinated and aggregated proteins is critical to protein quality control, but the underlying mechanisms are not well understood. Here, we report the essential role of the BEACH (beige and Chediak-Higashi) and WD40 repeat-containing protein WDR81 in eliminating ubiquitinated proteins through autophagy. WDR81 associates with ubiquitin (Ub)-positive protein foci, and its loss causes accumulation of Ub proteins and the autophagy cargo receptor p62. WDR81 interacts with p62, facilitating recognition of Ub proteins by p62. Furthermore, WDR81 interacts with LC3C through canonical LC3-interacting regions in the BEACH domain, promoting LC3C recruitment to ubiquitinated proteins. Inactivation of LC3C or defective autophagy results in accumulation of Ub protein aggregates enriched for WDR81. In mice, WDR81 inactivation causes accumulation of p62 bodies in cortical and striatal neurons in the brain. These data suggest that WDR81 coordinates p62 and LC3C to facilitate autophagic removal of Ub proteins, and provide important insights into CAMRQ2 syndrome, a WDR81-related developmental disorder. © 2017 Liu et al.
The BEACH-containing protein WDR81 coordinates p62 and LC3C to promote aggrephagy
Xing, Ruxiao; Tang, Changyong; Gao, Zhiyang
2017-01-01
Autophagy-dependent clearance of ubiquitinated and aggregated proteins is critical to protein quality control, but the underlying mechanisms are not well understood. Here, we report the essential role of the BEACH (beige and Chediak–Higashi) and WD40 repeat-containing protein WDR81 in eliminating ubiquitinated proteins through autophagy. WDR81 associates with ubiquitin (Ub)-positive protein foci, and its loss causes accumulation of Ub proteins and the autophagy cargo receptor p62. WDR81 interacts with p62, facilitating recognition of Ub proteins by p62. Furthermore, WDR81 interacts with LC3C through canonical LC3-interacting regions in the BEACH domain, promoting LC3C recruitment to ubiquitinated proteins. Inactivation of LC3C or defective autophagy results in accumulation of Ub protein aggregates enriched for WDR81. In mice, WDR81 inactivation causes accumulation of p62 bodies in cortical and striatal neurons in the brain. These data suggest that WDR81 coordinates p62 and LC3C to facilitate autophagic removal of Ub proteins, and provide important insights into CAMRQ2 syndrome, a WDR81-related developmental disorder. PMID:28404643
Cao, Buwen; Deng, Shuguang; Qin, Hua; Ding, Pingjian; Chen, Shaopeng; Li, Guanghui
2018-06-15
High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein⁻protein interaction (PPI) networks. In this study, based on penalized matrix decomposition ( PMD ), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMD pc ) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMD pc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
Transmission-blocking antibodies against mosquito C-type lectins for dengue prevention.
Liu, Yang; Zhang, Fuchun; Liu, Jianying; Xiao, Xiaoping; Zhang, Siyin; Qin, Chengfeng; Xiang, Ye; Wang, Penghua; Cheng, Gong
2014-02-01
C-type lectins are a family of proteins with carbohydrate-binding activity. Several C-type lectins in mammals or arthropods are employed as receptors or attachment factors to facilitate flavivirus invasion. We previously identified a C-type lectin in Aedes aegypti, designated as mosquito galactose specific C-type lectin-1 (mosGCTL-1), facilitating the attachment of West Nile virus (WNV) on the cell membrane. Here, we first identified that 9 A. aegypti mosGCTL genes were key susceptibility factors facilitating DENV-2 infection, of which mosGCTL-3 exhibited the most significant effect. We found that mosGCTL-3 was induced in mosquito tissues with DENV-2 infection, and that the protein interacted with DENV-2 surface envelop (E) protein and virions in vitro and in vivo. In addition, the other identified mosGCTLs interacted with the DENV-2 E protein, indicating that DENV may employ multiple mosGCTLs as ligands to promote the infection of vectors. The vectorial susceptibility factors that facilitate pathogen invasion may potentially be explored as a target to disrupt the acquisition of microbes from the vertebrate host. Indeed, membrane blood feeding of antisera against mosGCTLs dramatically reduced mosquito infective ratio. Hence, the immunization against mosGCTLs is a feasible approach for preventing dengue infection. Our study provides a future avenue for developing a transmission-blocking vaccine that interrupts the life cycle of dengue virus and reduces disease burden.
Invariant patterns in crystal lattices: Implications for protein folding algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
HART,WILLIAM E.; ISTRAIL,SORIN
2000-06-01
Crystal lattices are infinite periodic graphs that occur naturally in a variety of geometries and which are of fundamental importance in polymer science. Discrete models of protein folding use crystal lattices to define the space of protein conformations. Because various crystal lattices provide discretizations of the same physical phenomenon, it is reasonable to expect that there will exist invariants across lattices related to fundamental properties of the protein folding process. This paper considers whether performance-guaranteed approximability is such an invariant for HP lattice models. The authors define a master approximation algorithm that has provable performance guarantees provided that a specificmore » sublattice exists within a given lattice. They describe a broad class of crystal lattices that are approximable, which further suggests that approximability is a general property of HP lattice models.« less
Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun
2017-11-27
Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong
2017-01-01
Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Árbol, Javier Rodríguez; Perakakis, Pandelis; Garrido, Alba; Mata, José Luis; Fernández-Santaella, M Carmen; Vila, Jaime
2017-03-01
The preejection period (PEP) is an index of left ventricle contractility widely used in psychophysiological research. Its computation requires detecting the moment when the aortic valve opens, which coincides with the B point in the first derivative of impedance cardiogram (ICG). Although this operation has been traditionally made via visual inspection, several algorithms based on derivative calculations have been developed to enable an automatic performance of the task. However, despite their popularity, data about their empirical validation are not always available. The present study analyzes the performance in the estimation of the aortic valve opening of three popular algorithms, by comparing their performance with the visual detection of the B point made by two independent scorers. Algorithm 1 is based on the first derivative of the ICG, Algorithm 2 on the second derivative, and Algorithm 3 on the third derivative. Algorithm 3 showed the highest accuracy rate (78.77%), followed by Algorithm 1 (24.57%) and Algorithm 2 (13.82%). In the automatic computation of PEP, Algorithm 2 resulted in significantly more missed cycles (48.57%) than Algorithm 1 (6.3%) and Algorithm 3 (3.5%). Algorithm 2 also estimated a significantly lower average PEP (70 ms), compared with the values obtained by Algorithm 1 (119 ms) and Algorithm 3 (113 ms). Our findings indicate that the algorithm based on the third derivative of the ICG performs significantly better. Nevertheless, a visual inspection of the signal proves indispensable, and this article provides a novel visual guide to facilitate the manual detection of the B point. © 2016 Society for Psychophysiological Research.
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles
2011-01-01
Background Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Results Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. Conclusions In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches. PMID:21342534