Computer applications making rapid advances in high throughput microbial proteomics (HTMP).
Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen
2014-02-01
The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.
Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline.
Dowsey, Andrew W; Dunn, Michael J; Yang, Guang-Zhong
2008-04-01
The quest for high-throughput proteomics has revealed a number of challenges in recent years. Whilst substantial improvements in automated protein separation with liquid chromatography and mass spectrometry (LC/MS), aka 'shotgun' proteomics, have been achieved, large-scale open initiatives such as the Human Proteome Organization (HUPO) Brain Proteome Project have shown that maximal proteome coverage is only possible when LC/MS is complemented by 2D gel electrophoresis (2-DE) studies. Moreover, both separation methods require automated alignment and differential analysis to relieve the bioinformatics bottleneck and so make high-throughput protein biomarker discovery a reality. The purpose of this article is to describe a fully automatic image alignment framework for the integration of 2-DE into a high-throughput differential expression proteomics pipeline. The proposed method is based on robust automated image normalization (RAIN) to circumvent the drawbacks of traditional approaches. These use symbolic representation at the very early stages of the analysis, which introduces persistent errors due to inaccuracies in modelling and alignment. In RAIN, a third-order volume-invariant B-spline model is incorporated into a multi-resolution schema to correct for geometric and expression inhomogeneity at multiple scales. The normalized images can then be compared directly in the image domain for quantitative differential analysis. Through evaluation against an existing state-of-the-art method on real and synthetically warped 2D gels, the proposed analysis framework demonstrates substantial improvements in matching accuracy and differential sensitivity. High-throughput analysis is established through an accelerated GPGPU (general purpose computation on graphics cards) implementation. Supplementary material, software and images used in the validation are available at http://www.proteomegrid.org/rain/.
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Efficient visualization of high-throughput targeted proteomics experiments: TAPIR.
Röst, Hannes L; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars
2015-07-15
Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required. We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks identified in targeted proteomics experiments. The input formats are open, community-driven standardized data formats (mzML for raw data storage and TraML encoding the hierarchical relationships between transitions, peptides and proteins). TAPIR is scalable to proteome-wide targeted proteomics studies (as enabled by SWATH-MS), allowing researchers to visualize high-throughput datasets. The framework integrates well with existing automated analysis pipelines and can be extended beyond targeted proteomics to other types of analyses. TAPIR is available for all computing platforms under the 3-clause BSD license at https://github.com/msproteomicstools/msproteomicstools. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
HTAPP: High-Throughput Autonomous Proteomic Pipeline
Yu, Kebing; Salomon, Arthur R.
2011-01-01
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676
Remodeling Cildb, a popular database for cilia and links for ciliopathies
2014-01-01
Background New generation technologies in cell and molecular biology generate large amounts of data hard to exploit for individual proteins. This is particularly true for ciliary and centrosomal research. Cildb is a multi–species knowledgebase gathering high throughput studies, which allows advanced searches to identify proteins involved in centrosome, basal body or cilia biogenesis, composition and function. Combined to localization of genetic diseases on human chromosomes given by OMIM links, candidate ciliopathy proteins can be compiled through Cildb searches. Methods Othology between recent versions of the whole proteomes was computed using Inparanoid and ciliary high throughput studies were remapped on these recent versions. Results Due to constant evolution of the ciliary and centrosomal field, Cildb has been recently upgraded twice, with new species whole proteomes and new ciliary studies, and the latter version displays a novel BioMart interface, much more intuitive than the previous ones. Conclusions This already popular database is designed now for easier use and is up to date in regard to high throughput ciliary studies. PMID:25422781
LOCATE: a mouse protein subcellular localization database
Fink, J. Lynn; Aturaliya, Rajith N.; Davis, Melissa J.; Zhang, Fasheng; Hanson, Kelly; Teasdale, Melvena S.; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Teasdale, Rohan D.
2006-01-01
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for ∼40% of the mouse proteome. It is available at . PMID:16381849
Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O
2015-08-25
Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.
Awan, Muaaz Gul; Saeed, Fahad
2016-05-15
Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks. We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server. The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
FunRich proteomics software analysis, let the fun begin!
Benito-Martin, Alberto; Peinado, Héctor
2015-08-01
Protein MS analysis is the preferred method for unbiased protein identification. It is normally applied to a large number of both small-scale and high-throughput studies. However, user-friendly computational tools for protein analysis are still needed. In this issue, Mathivanan and colleagues (Proteomics 2015, 15, 2597-2601) report the development of FunRich software, an open-access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
High-Throughput Cloning and Expression Library Creation for Functional Proteomics
Festa, Fernanda; Steel, Jason; Bian, Xiaofang; Labaer, Joshua
2013-01-01
The study of protein function usually requires the use of a cloned version of the gene for protein expression and functional assays. This strategy is particular important when the information available regarding function is limited. The functional characterization of the thousands of newly identified proteins revealed by genomics requires faster methods than traditional single gene experiments, creating the need for fast, flexible and reliable cloning systems. These collections of open reading frame (ORF) clones can be coupled with high-throughput proteomics platforms, such as protein microarrays and cell-based assays, to answer biological questions. In this tutorial we provide the background for DNA cloning, discuss the major high-throughput cloning systems (Gateway® Technology, Flexi® Vector Systems, and Creator™ DNA Cloning System) and compare them side-by-side. We also report an example of high-throughput cloning study and its application in functional proteomics. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP12). Details can be found at http://www.proteomicstutorials.org. PMID:23457047
High-throughput cloning and expression library creation for functional proteomics.
Festa, Fernanda; Steel, Jason; Bian, Xiaofang; Labaer, Joshua
2013-05-01
The study of protein function usually requires the use of a cloned version of the gene for protein expression and functional assays. This strategy is particularly important when the information available regarding function is limited. The functional characterization of the thousands of newly identified proteins revealed by genomics requires faster methods than traditional single-gene experiments, creating the need for fast, flexible, and reliable cloning systems. These collections of ORF clones can be coupled with high-throughput proteomics platforms, such as protein microarrays and cell-based assays, to answer biological questions. In this tutorial, we provide the background for DNA cloning, discuss the major high-throughput cloning systems (Gateway® Technology, Flexi® Vector Systems, and Creator(TM) DNA Cloning System) and compare them side-by-side. We also report an example of high-throughput cloning study and its application in functional proteomics. This tutorial is part of the International Proteomics Tutorial Programme (IPTP12). © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.
Trudgian, David C; Mirzaei, Hamid
2012-12-07
We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.
Identification of functional modules using network topology and high-throughput data.
Ulitsky, Igor; Shamir, Ron
2007-01-26
With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.
Computer aided manual validation of mass spectrometry-based proteomic data.
Curran, Timothy G; Bryson, Bryan D; Reigelhaupt, Michael; Johnson, Hannah; White, Forest M
2013-06-15
Advances in mass spectrometry-based proteomic technologies have increased the speed of analysis and the depth provided by a single analysis. Computational tools to evaluate the accuracy of peptide identifications from these high-throughput analyses have not kept pace with technological advances; currently the most common quality evaluation methods are based on statistical analysis of the likelihood of false positive identifications in large-scale data sets. While helpful, these calculations do not consider the accuracy of each identification, thus creating a precarious situation for biologists relying on the data to inform experimental design. Manual validation is the gold standard approach to confirm accuracy of database identifications, but is extremely time-intensive. To palliate the increasing time required to manually validate large proteomic datasets, we provide computer aided manual validation software (CAMV) to expedite the process. Relevant spectra are collected, catalogued, and pre-labeled, allowing users to efficiently judge the quality of each identification and summarize applicable quantitative information. CAMV significantly reduces the burden associated with manual validation and will hopefully encourage broader adoption of manual validation in mass spectrometry-based proteomics. Copyright © 2013 Elsevier Inc. All rights reserved.
Lee, Hangyeore; Mun, Dong-Gi; Bae, Jingi; Kim, Hokeun; Oh, Se Yeon; Park, Young Soo; Lee, Jae-Hyuk; Lee, Sang-Won
2015-08-21
We report a new and simple design of a fully automated dual-online ultra-high pressure liquid chromatography system. The system employs only two nano-volume switching valves (a two-position four port valve and a two-position ten port valve) that direct solvent flows from two binary nano-pumps for parallel operation of two analytical columns and two solid phase extraction (SPE) columns. Despite the simple design, the sDO-UHPLC offers many advantageous features that include high duty cycle, back flushing sample injection for fast and narrow zone sample injection, online desalting, high separation resolution and high intra/inter-column reproducibility. This system was applied to analyze proteome samples not only in high throughput deep proteome profiling experiments but also in high throughput MRM experiments.
Jimenez, Connie R; Piersma, Sander; Pham, Thang V
2007-12-01
Proteomics aims to create a link between genomic information, biological function and disease through global studies of protein expression, modification and protein-protein interactions. Recent advances in key proteomics tools, such as mass spectrometry (MS) and (bio)informatics, provide tremendous opportunities for biomarker-related clinical applications. In this review, we focus on two complementary MS-based approaches with high potential for the discovery of biomarker patterns and low-abundant candidate biomarkers in biofluids: high-throughput matrix-assisted laser desorption/ionization time-of-flight mass spectroscopy-based methods for peptidome profiling and label-free liquid chromatography-based methods coupled to MS for in-depth profiling of biofluids with a focus on subproteomes, including the low-molecular-weight proteome, carrier-bound proteome and N-linked glycoproteome. The two approaches differ in their aims, throughput and sensitivity. We discuss recent progress and challenges in the analysis of plasma/serum and proximal fluids using these strategies and highlight the potential of liquid chromatography-MS-based proteomics of cancer cell and tumor secretomes for the discovery of candidate blood-based biomarkers. Strategies for candidate validation are also described.
An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions.
Kundu, Kousik; Backofen, Rolf
2017-01-01
Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.
The Proteome Folding Project: Proteome-scale prediction of structure and function
Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard
2011-01-01
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995
Röst, Hannes L; Liu, Yansheng; D'Agostino, Giuseppe; Zanella, Matteo; Navarro, Pedro; Rosenberger, George; Collins, Ben C; Gillet, Ludovic; Testa, Giuseppe; Malmström, Lars; Aebersold, Ruedi
2016-09-01
Next-generation mass spectrometric (MS) techniques such as SWATH-MS have substantially increased the throughput and reproducibility of proteomic analysis, but ensuring consistent quantification of thousands of peptide analytes across multiple liquid chromatography-tandem MS (LC-MS/MS) runs remains a challenging and laborious manual process. To produce highly consistent and quantitatively accurate proteomics data matrices in an automated fashion, we developed TRIC (http://proteomics.ethz.ch/tric/), a software tool that utilizes fragment-ion data to perform cross-run alignment, consistent peak-picking and quantification for high-throughput targeted proteomics. TRIC reduced the identification error compared to a state-of-the-art SWATH-MS analysis without alignment by more than threefold at constant recall while correcting for highly nonlinear chromatographic effects. On a pulsed-SILAC experiment performed on human induced pluripotent stem cells, TRIC was able to automatically align and quantify thousands of light and heavy isotopic peak groups. Thus, TRIC fills a gap in the pipeline for automated analysis of massively parallel targeted proteomics data sets.
Machine learning in computational biology to accelerate high-throughput protein expression.
Sastry, Anand; Monk, Jonathan; Tegel, Hanna; Uhlen, Mathias; Palsson, Bernhard O; Rockberg, Johan; Brunk, Elizabeth
2017-08-15
The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. ebrunk@ucsd.edu or johanr@biotech.kth.se. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Design and Initial Characterization of the SC-200 Proteomics Standard Mixture
Bauman, Andrew; Higdon, Roger; Rapson, Sean; Loiue, Brenton; Hogan, Jason; Stacy, Robin; Napuli, Alberto; Guo, Wenjin; van Voorhis, Wesley; Roach, Jared; Lu, Vincent; Landorf, Elizabeth; Stewart, Elizabeth; Kolker, Natali; Collart, Frank; Myler, Peter; van Belle, Gerald
2011-01-01
Abstract High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels. PMID:21250827
Design and initial characterization of the SC-200 proteomics standard mixture.
Bauman, Andrew; Higdon, Roger; Rapson, Sean; Loiue, Brenton; Hogan, Jason; Stacy, Robin; Napuli, Alberto; Guo, Wenjin; van Voorhis, Wesley; Roach, Jared; Lu, Vincent; Landorf, Elizabeth; Stewart, Elizabeth; Kolker, Natali; Collart, Frank; Myler, Peter; van Belle, Gerald; Kolker, Eugene
2011-01-01
High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels.
Computational biology for ageing
Wieser, Daniela; Papatheodorou, Irene; Ziehm, Matthias; Thornton, Janet M.
2011-01-01
High-throughput genomic and proteomic technologies have generated a wealth of publicly available data on ageing. Easy access to these data, and their computational analysis, is of great importance in order to pinpoint the causes and effects of ageing. Here, we provide a description of the existing databases and computational tools on ageing that are available for researchers. We also describe the computational approaches to data interpretation in the field of ageing including gene expression, comparative and pathway analyses, and highlight the challenges for future developments. We review recent biological insights gained from applying bioinformatics methods to analyse and interpret ageing data in different organisms, tissues and conditions. PMID:21115530
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Quantitative proteomics in cardiovascular research: global and targeted strategies
Shen, Xiaomeng; Young, Rebeccah; Canty, John M.; Qu, Jun
2014-01-01
Extensive technical advances in the past decade have substantially expanded quantitative proteomics in cardiovascular research. This has great promise for elucidating the mechanisms of cardiovascular diseases (CVD) and the discovery of cardiac biomarkers used for diagnosis and treatment evaluation. Global and targeted proteomics are the two major avenues of quantitative proteomics. While global approaches enable unbiased discovery of altered proteins via relative quantification at the proteome level, targeted techniques provide higher sensitivity and accuracy, and are capable of multiplexed absolute quantification in numerous clinical/biological samples. While promising, technical challenges need to be overcome to enable full utilization of these techniques in cardiovascular medicine. Here we discuss recent advances in quantitative proteomics and summarize applications in cardiovascular research with an emphasis on biomarker discovery and elucidating molecular mechanisms of disease. We propose the integration of global and targeted strategies as a high-throughput pipeline for cardiovascular proteomics. Targeted approaches enable rapid, extensive validation of biomarker candidates discovered by global proteomics. These approaches provide a promising alternative to immunoassays and other low-throughput means currently used for limited validation. PMID:24920501
Bladergroen, Marco R.; van der Burgt, Yuri E. M.
2015-01-01
For large-scale and standardized applications in mass spectrometry- (MS-) based proteomics automation of each step is essential. Here we present high-throughput sample preparation solutions for balancing the speed of current MS-acquisitions and the time needed for analytical workup of body fluids. The discussed workflows reduce body fluid sample complexity and apply for both bottom-up proteomics experiments and top-down protein characterization approaches. Various sample preparation methods that involve solid-phase extraction (SPE) including affinity enrichment strategies have been automated. Obtained peptide and protein fractions can be mass analyzed by direct infusion into an electrospray ionization (ESI) source or by means of matrix-assisted laser desorption ionization (MALDI) without further need of time-consuming liquid chromatography (LC) separations. PMID:25692071
LXtoo: an integrated live Linux distribution for the bioinformatics community
2012-01-01
Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356
LXtoo: an integrated live Linux distribution for the bioinformatics community.
Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu
2012-07-19
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
Tipton, Jeremiah D; Tran, John C; Catherman, Adam D; Ahlf, Dorothy R; Durbin, Kenneth R; Lee, Ji Eun; Kellie, John F; Kelleher, Neil L; Hendrickson, Christopher L; Marshall, Alan G
2012-03-06
Current high-throughput top-down proteomic platforms provide routine identification of proteins less than 25 kDa with 4-D separations. This short communication reports the application of technological developments over the past few years that improve protein identification and characterization for masses greater than 25 kDa. Advances in separation science have allowed increased numbers of proteins to be identified, especially by nanoliquid chromatography (nLC) prior to mass spectrometry (MS) analysis. Further, a goal of high-throughput top-down proteomics is to extend the mass range for routine nLC MS analysis up to 80 kDa because gene sequence analysis predicts that ~70% of the human proteome is transcribed to be less than 80 kDa. Normally, large proteins greater than 50 kDa are identified and characterized by top-down proteomics through fraction collection and direct infusion at relatively low throughput. Further, other MS-based techniques provide top-down protein characterization, however at low resolution for intact mass measurement. Here, we present analysis of standard (up to 78 kDa) and whole cell lysate proteins by Fourier transform ion cyclotron resonance mass spectrometry (nLC electrospray ionization (ESI) FTICR MS). The separation platform reduced the complexity of the protein matrix so that, at 14.5 T, proteins from whole cell lysate up to 72 kDa are baseline mass resolved on a nano-LC chromatographic time scale. Further, the results document routine identification of proteins at improved throughput based on accurate mass measurement (less than 10 ppm mass error) of precursor and fragment ions for proteins up to 50 kDa.
Lapek, John D; Greninger, Patricia; Morris, Robert; Amzallag, Arnaud; Pruteanu-Malinici, Iulian; Benes, Cyril H; Haas, Wilhelm
2017-10-01
The formation of protein complexes and the co-regulation of the cellular concentrations of proteins are essential mechanisms for cellular signaling and for maintaining homeostasis. Here we use isobaric-labeling multiplexed proteomics to analyze protein co-regulation and show that this allows the identification of protein-protein associations with high accuracy. We apply this 'interactome mapping by high-throughput quantitative proteome analysis' (IMAHP) method to a panel of 41 breast cancer cell lines and show that deviations of the observed protein co-regulations in specific cell lines from the consensus network affects cellular fitness. Furthermore, these aberrant interactions serve as biomarkers that predict the drug sensitivity of cell lines in screens across 195 drugs. We expect that IMAHP can be broadly used to gain insight into how changing landscapes of protein-protein associations affect the phenotype of biological systems.
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
The amino acid's backup bone - storage solutions for proteomics facilities.
Meckel, Hagen; Stephan, Christian; Bunse, Christian; Krafzik, Michael; Reher, Christopher; Kohl, Michael; Meyer, Helmut Erich; Eisenacher, Martin
2014-01-01
Proteomics methods, especially high-throughput mass spectrometry analysis have been continually developed and improved over the years. The analysis of complex biological samples produces large volumes of raw data. Data storage and recovery management pose substantial challenges to biomedical or proteomic facilities regarding backup and archiving concepts as well as hardware requirements. In this article we describe differences between the terms backup and archive with regard to manual and automatic approaches. We also introduce different storage concepts and technologies from transportable media to professional solutions such as redundant array of independent disks (RAID) systems, network attached storages (NAS) and storage area network (SAN). Moreover, we present a software solution, which we developed for the purpose of long-term preservation of large mass spectrometry raw data files on an object storage device (OSD) archiving system. Finally, advantages, disadvantages, and experiences from routine operations of the presented concepts and technologies are evaluated and discussed. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013. Published by Elsevier B.V.
Tempest: GPU-CPU computing for high-throughput database spectral matching.
Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A
2012-07-06
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
The UniProtKB guide to the human proteome
Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole
2016-01-01
Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845
Proteomic Analysis of Metabolic Responses to Biofuels and Chemicals in Photosynthetic Cyanobacteria.
Sun, T; Chen, L; Zhang, W
2017-01-01
Recent progresses in various "omics" technologies have enabled quantitative measurements of biological molecules in a high-throughput manner. Among them, high-throughput proteomics is a rapidly advancing field that offers a new means to quantify metabolic changes at protein level, which has significantly facilitated our understanding of cellular process, such as protein synthesis, posttranslational modifications, and degradation in responding to environmental perturbations. Cyanobacteria are autotrophic prokaryotes that can perform oxygenic photosynthesis and have recently attracted significant attentions as one promising alternative to traditionally biomass-based "microbial cell factories" to produce green fuels and chemicals. However, early studies have shown that the low tolerance to toxic biofuels and chemicals represented one major hurdle for further improving productivity of the cyanobacterial production systems. To address the issue, metabolic responses and their regulation of cyanobacterial cells to toxic end-products need to be defined. In this chapter, we discuss recent progresses in interpreting cyanobacterial responses to biofuels and chemicals using high-throughput proteomics approach, aiming to provide insights and guidelines on how to enhance tolerance and productivity of biofuels or chemicals in the renewable cyanobacteria systems in the future. © 2017 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clair, Geremy; Piehowski, Paul D.; Nicola, Teodora
Global proteomics approaches allow characterization of whole tissue lysates to an impressive depth. However, it is now increasingly recognized that to better understand the complexity of multicellular organisms, global protein profiling of specific spatially defined regions/substructures of tissues (i.e. spatially-resolved proteomics) is essential. Laser capture microdissection (LCM) enables microscopic isolation of defined regions of tissues preserving crucial spatial information. However, current proteomics workflows entail several manual sample preparation steps and are challenged by the microscopic mass-limited samples generated by LCM, and that impact measurement robustness, quantification, and throughput. Here, we coupled LCM with a fully automated sample preparation workflow thatmore » with a single manual step allows: protein extraction, tryptic digestion, peptide cleanup and LC-MS/MS analysis of proteomes from microdissected tissues. Benchmarking against the current state of the art in ultrasensitive global proteomic analysis, our approach demonstrated significant improvements in quantification and throughput. Using our LCM-SNaPP proteomics approach, we characterized to a depth of more than 3,400 proteins, the ontogeny of protein changes during normal lung development in laser capture microdissected alveolar tissue containing ~4,000 cells per sample. Importantly, the data revealed quantitative changes for 350 low abundance transcription factors and signaling molecules, confirming earlier transcript-level observations and defining seven modules of coordinated transcription factor/signaling molecule expression patterns, suggesting that a complex network of temporal regulatory control directs normal lung development with epigenetic regulation fine-tuning pre-natal developmental processes. Our LCM-proteomics approach facilitates efficient, spatially-resolved, ultrasensitive global proteomics analyses in high-throughput that will be enabling for several clinical and biological applications.« less
Advances in Proteomics Data Analysis and Display Using an Accurate Mass and Time Tag Approach
Zimmer, Jennifer S.D.; Monroe, Matthew E.; Qian, Wei-Jun; Smith, Richard D.
2007-01-01
Proteomics has recently demonstrated utility in understanding cellular processes on the molecular level as a component of systems biology approaches and for identifying potential biomarkers of various disease states. The large amount of data generated by utilizing high efficiency (e.g., chromatographic) separations coupled to high mass accuracy mass spectrometry for high-throughput proteomics analyses presents challenges related to data processing, analysis, and display. This review focuses on recent advances in nanoLC-FTICR-MS-based proteomics approaches and the accompanying data processing tools that have been developed to display and interpret the large volumes of data being produced. PMID:16429408
Content Is King: Databases Preserve the Collective Information of Science.
Yates, John R
2018-04-01
Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.
De Groot, Anne S; Rappuoli, Rino
2004-02-01
Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
Kakourou, Alexia; Vach, Werner; Nicolardi, Simone; van der Burgt, Yuri; Mertens, Bart
2016-10-01
Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.
CrossCheck: an open-source web tool for high-throughput screen data analysis.
Najafov, Jamil; Najafov, Ayaz
2017-07-19
Modern high-throughput screening methods allow researchers to generate large datasets that potentially contain important biological information. However, oftentimes, picking relevant hits from such screens and generating testable hypotheses requires training in bioinformatics and the skills to efficiently perform database mining. There are currently no tools available to general public that allow users to cross-reference their screen datasets with published screen datasets. To this end, we developed CrossCheck, an online platform for high-throughput screen data analysis. CrossCheck is a centralized database that allows effortless comparison of the user-entered list of gene symbols with 16,231 published datasets. These datasets include published data from genome-wide RNAi and CRISPR screens, interactome proteomics and phosphoproteomics screens, cancer mutation databases, low-throughput studies of major cell signaling mediators, such as kinases, E3 ubiquitin ligases and phosphatases, and gene ontological information. Moreover, CrossCheck includes a novel database of predicted protein kinase substrates, which was developed using proteome-wide consensus motif searches. CrossCheck dramatically simplifies high-throughput screen data analysis and enables researchers to dig deep into the published literature and streamline data-driven hypothesis generation. CrossCheck is freely accessible as a web-based application at http://proteinguru.com/crosscheck.
Mahendran, Shalini M; Oikonomopoulou, Katerina; Diamandis, Eleftherios P; Chandran, Vinod
Synovial fluid (SF) is a protein-rich fluid produced into the joint cavity by cells of the synovial membrane. Due to its direct contact with articular cartilage, surfaces of the bone, and the synoviocytes of the inner membrane, it provides a promising reflection of the biochemical state of the joint under varying physiological and pathophysiological conditions. This property of SF has been exploited within numerous studies in search of unique biomarkers of joint pathologies with the ultimate goal of developing minimally invasive clinical assays to detect and/or monitor disease states. Several proteomic methodologies have been employed to mine the SF proteome. From elementary immunoassays to high-throughput analyses using mass spectrometry-based techniques, each has demonstrated distinct advantages and disadvantages in the identification and quantification of SF proteins. This review will explore the role of SF in the elucidation of the arthritis proteome and the extent to which high-throughput techniques have facilitated the discovery and validation of protein biomarkers from osteoarthritis (OA), rheumatoid arthritis (RA), psoriatic arthritis (PsA), and juvenile idiopathic arthritis (JIA) patients.
Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics
Breckels, Lisa M.; Holden, Sean B.; Wojnar, David; Mulvey, Claire M.; Christoforou, Andy; Groen, Arnoud; Trotter, Matthew W. B.; Kohlbacher, Oliver; Lilley, Kathryn S.; Gatto, Laurent
2016-01-01
Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis. PMID:27175778
Keshishian, Hasmik; Burgess, Michael W; Specht, Harrison; Wallace, Luke; Clauser, Karl R; Gillette, Michael A; Carr, Steven A
2017-08-01
Proteomic characterization of blood plasma is of central importance to clinical proteomics and particularly to biomarker discovery studies. The vast dynamic range and high complexity of the plasma proteome have, however, proven to be serious challenges and have often led to unacceptable tradeoffs between depth of coverage and sample throughput. We present an optimized sample-processing pipeline for analysis of the human plasma proteome that provides greatly increased depth of detection, improved quantitative precision and much higher sample analysis throughput as compared with prior methods. The process includes abundant protein depletion, isobaric labeling at the peptide level for multiplexed relative quantification and ultra-high-performance liquid chromatography coupled to accurate-mass, high-resolution tandem mass spectrometry analysis of peptides fractionated off-line by basic pH reversed-phase (bRP) chromatography. The overall reproducibility of the process, including immunoaffinity depletion, is high, with a process replicate coefficient of variation (CV) of <12%. Using isobaric tags for relative and absolute quantitation (iTRAQ) 4-plex, >4,500 proteins are detected and quantified per patient sample on average, with two or more peptides per protein and starting from as little as 200 μl of plasma. The approach can be multiplexed up to 10-plex using tandem mass tags (TMT) reagents, further increasing throughput, albeit with some decrease in the number of proteins quantified. In addition, we provide a rapid protocol for analysis of nonfractionated depleted plasma samples analyzed in 10-plex. This provides ∼600 quantified proteins for each of the ten samples in ∼5 h of instrument time.
Computational Lipidomics and Lipid Bioinformatics: Filling In the Blanks.
Pauling, Josch; Klipp, Edda
2016-12-22
Lipids are highly diverse metabolites of pronounced importance in health and disease. While metabolomics is a broad field under the omics umbrella that may also relate to lipids, lipidomics is an emerging field which specializes in the identification, quantification and functional interpretation of complex lipidomes. Today, it is possible to identify and distinguish lipids in a high-resolution, high-throughput manner and simultaneously with a lot of structural detail. However, doing so may produce thousands of mass spectra in a single experiment which has created a high demand for specialized computational support to analyze these spectral libraries. The computational biology and bioinformatics community has so far established methodology in genomics, transcriptomics and proteomics but there are many (combinatorial) challenges when it comes to structural diversity of lipids and their identification, quantification and interpretation. This review gives an overview and outlook on lipidomics research and illustrates ongoing computational and bioinformatics efforts. These efforts are important and necessary steps to advance the lipidomics field alongside analytic, biochemistry, biomedical and biology communities and to close the gap in available computational methodology between lipidomics and other omics sub-branches.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
2012-01-01
Multiple reaction monitoring mass spectrometry (MRM-MS) with stable isotope dilution (SID) is increasingly becoming a widely accepted assay for the quantification of proteins and peptides. These assays have shown great promise in relatively high throughput verification of candidate biomarkers. While the use of MRM-MS assays is well established in the small molecule realm, their introduction and use in proteomics is relatively recent. As such, statistical and computational methods for the analysis of MRM-MS data from proteins and peptides are still being developed. Based on our extensive experience with analyzing a wide range of SID-MRM-MS data, we set forth a methodology for analysis that encompasses significant aspects ranging from data quality assessment, assay characterization including calibration curves, limits of detection (LOD) and quantification (LOQ), and measurement of intra- and interlaboratory precision. We draw upon publicly available seminal datasets to illustrate our methods and algorithms. PMID:23176545
Mani, D R; Abbatiello, Susan E; Carr, Steven A
2012-01-01
Multiple reaction monitoring mass spectrometry (MRM-MS) with stable isotope dilution (SID) is increasingly becoming a widely accepted assay for the quantification of proteins and peptides. These assays have shown great promise in relatively high throughput verification of candidate biomarkers. While the use of MRM-MS assays is well established in the small molecule realm, their introduction and use in proteomics is relatively recent. As such, statistical and computational methods for the analysis of MRM-MS data from proteins and peptides are still being developed. Based on our extensive experience with analyzing a wide range of SID-MRM-MS data, we set forth a methodology for analysis that encompasses significant aspects ranging from data quality assessment, assay characterization including calibration curves, limits of detection (LOD) and quantification (LOQ), and measurement of intra- and interlaboratory precision. We draw upon publicly available seminal datasets to illustrate our methods and algorithms.
Wu, Qi; Yuan, Huiming; Zhang, Lihua; Zhang, Yukui
2012-06-20
With the acceleration of proteome research, increasing attention has been paid to multidimensional liquid chromatography-mass spectrometry (MDLC-MS) due to its high peak capacity and separation efficiency. Recently, many efforts have been put to improve MDLC-based strategies including "top-down" and "bottom-up" to enable highly sensitive qualitative and quantitative analysis of proteins, as well as accelerate the whole analytical procedure. Integrated platforms with combination of sample pretreatment, multidimensional separations and identification were also developed to achieve high throughput and sensitive detection of proteomes, facilitating highly accurate and reproducible quantification. This review summarized the recent advances of such techniques and their applications in qualitative and quantitative analysis of proteomes. Copyright © 2012 Elsevier B.V. All rights reserved.
A comprehensive and scalable database search system for metaproteomics.
Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W
2016-08-16
Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.
Helsens, Kenny; Colaert, Niklaas; Barsnes, Harald; Muth, Thilo; Flikka, Kristian; Staes, An; Timmerman, Evy; Wortelkamp, Steffi; Sickmann, Albert; Vandekerckhove, Joël; Gevaert, Kris; Martens, Lennart
2010-03-01
MS-based proteomics produces large amounts of mass spectra that require processing, identification and possibly quantification before interpretation can be undertaken. High-throughput studies require automation of these various steps, and management of the data in association with the results obtained. We here present ms_lims (http://genesis.UGent.be/ms_lims), a freely available, open-source system based on a central database to automate data management and processing in MS-driven proteomics analyses.
Even-Desrumeaux, Klervi; Baty, Daniel; Chames, Patrick
2010-01-01
Antibodies microarrays are among the novel class of rapidly emerging proteomic technologies that will allow us to efficiently perform specific diagnosis and proteome analysis. Recombinant antibody fragments are especially suited for this approach but their stability is often a limiting factor. Camelids produce functional antibodies devoid of light chains (HCAbs) of which the single N-terminal domain is fully capable of antigen binding. When produced as an independent domain, these so-called single domain antibody fragments (sdAbs) have several advantages for biotechnological applications thanks to their unique properties of size (15 kDa), stability, solubility, and expression yield. These features should allow sdAbs to outperform other antibody formats in a number of applications, notably as capture molecule for antibody arrays. In this study, we have produced antibody microarrays using direct and oriented immobilization of sdAbs produced in crude bacterial lysates to generate proof-of-principle of a high-throughput compatible array design. Several sdAb immobilization strategies have been explored. Immobilization of in vivo biotinylated sdAbs by direct spotting of bacterial lysate on streptavidin and sandwich detection was developed to achieve high sensitivity and specificity, whereas immobilization of “multi-tagged” sdAbs via anti-tag antibodies and direct labeled sample detection strategy was optimized for the design of high-density antibody arrays for high-throughput proteomics and identification of potential biomarkers. PMID:20859568
Proteome data to explore the impact of pBClin15 on Bacillus cereus ATCC 14579.
Madeira, Jean-Paul; Alpha-Bazin, Béatrice; Armengaud, Jean; Omer, Hélène; Duport, Catherine
2016-09-01
This data article reports changes in the cellular and exoproteome of B. cereus cured from pBClin15.Time-course changes of proteins were assessed by high-throughput nanoLC-MS/MS. We report all the peptides and proteins identified and quantified in B. cereus with and without pBClin15. Proteins were classified into functional groups using the information available in the KEGG classification and we reported their abundance in term of normalized spectral abundance factor. The repertoire of experimentally confirmed proteins of B. cereus presented here is the largest ever reported, and provides new insights into the interplay between pBClin15 and its host B. cereus ATCC 14579. The data reported here is related to a published shotgun proteomics analysis regarding the role of pBClin15, "Deciphering the interactions between the Bacillus cereus linear plasmid, pBClin15, and its host by high-throughput comparative proteomics" Madeira et al. [1]. All the associated mass spectrometry data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (http://www.ebi.ac.uk/pride/), with the dataset identifier PRIDE: PXD001568, PRIDE: PXD002788 and PRIDE: PXD002789.
Proteomics and Systems Biology: Current and Future Applications in the Nutritional Sciences1
Moore, J. Bernadette; Weeks, Mark E.
2011-01-01
In the last decade, advances in genomics, proteomics, and metabolomics have yielded large-scale datasets that have driven an interest in global analyses, with the objective of understanding biological systems as a whole. Systems biology integrates computational modeling and experimental biology to predict and characterize the dynamic properties of biological systems, which are viewed as complex signaling networks. Whereas the systems analysis of disease-perturbed networks holds promise for identification of drug targets for therapy, equally the identified critical network nodes may be targeted through nutritional intervention in either a preventative or therapeutic fashion. As such, in the context of the nutritional sciences, it is envisioned that systems analysis of normal and nutrient-perturbed signaling networks in combination with knowledge of underlying genetic polymorphisms will lead to a future in which the health of individuals will be improved through predictive and preventative nutrition. Although high-throughput transcriptomic microarray data were initially most readily available and amenable to systems analysis, recent technological and methodological advances in MS have contributed to a linear increase in proteomic investigations. It is now commonplace for combined proteomic technologies to generate complex, multi-faceted datasets, and these will be the keystone of future systems biology research. This review will define systems biology, outline current proteomic methodologies, highlight successful applications of proteomics in nutrition research, and discuss the challenges for future applications of systems biology approaches in the nutritional sciences. PMID:22332076
Testing and Validation of Computational Methods for Mass Spectrometry.
Gatto, Laurent; Hansen, Kasper D; Hoopmann, Michael R; Hermjakob, Henning; Kohlbacher, Oliver; Beyer, Andreas
2016-03-04
High-throughput methods based on mass spectrometry (proteomics, metabolomics, lipidomics, etc.) produce a wealth of data that cannot be analyzed without computational methods. The impact of the choice of method on the overall result of a biological study is often underappreciated, but different methods can result in very different biological findings. It is thus essential to evaluate and compare the correctness and relative performance of computational methods. The volume of the data as well as the complexity of the algorithms render unbiased comparisons challenging. This paper discusses some problems and challenges in testing and validation of computational methods. We discuss the different types of data (simulated and experimental validation data) as well as different metrics to compare methods. We also introduce a new public repository for mass spectrometric reference data sets ( http://compms.org/RefData ) that contains a collection of publicly available data sets for performance evaluation for a wide range of different methods.
High-throughput Crystallography for Structural Genomics
Joachimiak, Andrzej
2009-01-01
Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact. PMID:19765976
Shukla, Hem D
2017-10-25
During the past century, our understanding of cancer diagnosis and treatment has been based on a monogenic approach, and as a consequence our knowledge of the clinical genetic underpinnings of cancer is incomplete. Since the completion of the human genome in 2003, it has steered us into therapeutic target discovery, enabling us to mine the genome using cutting edge proteogenomics tools. A number of novel and promising cancer targets have emerged from the genome project for diagnostics, therapeutics, and prognostic markers, which are being used to monitor response to cancer treatment. The heterogeneous nature of cancer has hindered progress in understanding the underlying mechanisms that lead to abnormal cellular growth. Since, the start of The Cancer Genome Atlas (TCGA), and the International Genome consortium projects, there has been tremendous progress in genome sequencing and immense numbers of cancer genomes have been completed, and this approach has transformed our understanding of the diagnosis and treatment of different types of cancers. By employing Genomics and proteomics technologies, an immense amount of genomic data is being generated on clinical tumors, which has transformed the cancer landscape and has the potential to transform cancer diagnosis and prognosis. A complete molecular view of the cancer landscape is necessary for understanding the underlying mechanisms of cancer initiation to improve diagnosis and prognosis, which ultimately will lead to personalized treatment. Interestingly, cancer proteome analysis has also allowed us to identify biomarkers to monitor drug and radiation resistance in patients undergoing cancer treatment. Further, TCGA-funded studies have allowed for the genomic and transcriptomic characterization of targeted cancers, this analysis aiding the development of targeted therapies for highly lethal malignancy. High-throughput technologies, such as complete proteome, epigenome, protein-protein interaction, and pharmacogenomics data, are indispensable to glean into the cancer genome and proteome and these approaches have generated multidimensional universal studies of genes and proteins (OMICS) data which has the potential to facilitate precision medicine. However, due to slow progress in computational technologies, the translation of big omics data into their clinical aspects have been slow. In this review, attempts have been made to describe the role of high-throughput genomic and proteomic technologies in identifying a panel of biomarkers which could be used for the early diagnosis and prognosis of cancer.
A Method for Label-Free, Differential Top-Down Proteomics.
Ntai, Ioanna; Toby, Timothy K; LeDuc, Richard D; Kelleher, Neil L
2016-01-01
Biomarker discovery in the translational research has heavily relied on labeled and label-free quantitative bottom-up proteomics. Here, we describe a new approach to biomarker studies that utilizes high-throughput top-down proteomics and is the first to offer whole protein characterization and relative quantitation within the same experiment. Using yeast as a model, we report procedures for a label-free approach to quantify the relative abundance of intact proteins ranging from 0 to 30 kDa in two different states. In this chapter, we describe the integrated methodology for the large-scale profiling and quantitation of the intact proteome by liquid chromatography-mass spectrometry (LC-MS) without the need for metabolic or chemical labeling. This recent advance for quantitative top-down proteomics is best implemented with a robust and highly controlled sample preparation workflow before data acquisition on a high-resolution mass spectrometer, and the application of a hierarchical linear statistical model to account for the multiple levels of variance contained in quantitative proteomic comparisons of samples for basic and clinical research.
2011-01-01
Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234
A high-throughput, multi-channel photon-counting detector with picosecond timing
NASA Astrophysics Data System (ADS)
Lapington, J. S.; Fraser, G. W.; Miller, G. M.; Ashton, T. J. R.; Jarron, P.; Despeisse, M.; Powolny, F.; Howorth, J.; Milnes, J.
2009-06-01
High-throughput photon counting with high time resolution is a niche application area where vacuum tubes can still outperform solid-state devices. Applications in the life sciences utilizing time-resolved spectroscopies, particularly in the growing field of proteomics, will benefit greatly from performance enhancements in event timing and detector throughput. The HiContent project is a collaboration between the University of Leicester Space Research Centre, the Microelectronics Group at CERN, Photek Ltd., and end-users at the Gray Cancer Institute and the University of Manchester. The goal is to develop a detector system specifically designed for optical proteomics, capable of high content (multi-parametric) analysis at high throughput. The HiContent detector system is being developed to exploit this niche market. It combines multi-channel, high time resolution photon counting in a single miniaturized detector system with integrated electronics. The combination of enabling technologies; small pore microchannel plate devices with very high time resolution, and high-speed multi-channel ASIC electronics developed for the LHC at CERN, provides the necessary building blocks for a high-throughput detector system with up to 1024 parallel counting channels and 20 ps time resolution. We describe the detector and electronic design, discuss the current status of the HiContent project and present the results from a 64-channel prototype system. In the absence of an operational detector, we present measurements of the electronics performance using a pulse generator to simulate detector events. Event timing results from the NINO high-speed front-end ASIC captured using a fast digital oscilloscope are compared with data taken with the proposed electronic configuration which uses the multi-channel HPTDC timing ASIC.
Development of Droplet Microfluidics Enabling High-Throughput Single-Cell Analysis.
Wen, Na; Zhao, Zhan; Fan, Beiyuan; Chen, Deyong; Men, Dong; Wang, Junbo; Chen, Jian
2016-07-05
This article reviews recent developments in droplet microfluidics enabling high-throughput single-cell analysis. Five key aspects in this field are included in this review: (1) prototype demonstration of single-cell encapsulation in microfluidic droplets; (2) technical improvements of single-cell encapsulation in microfluidic droplets; (3) microfluidic droplets enabling single-cell proteomic analysis; (4) microfluidic droplets enabling single-cell genomic analysis; and (5) integrated microfluidic droplet systems enabling single-cell screening. We examine the advantages and limitations of each technique and discuss future research opportunities by focusing on key performances of throughput, multifunctionality, and absolute quantification.
Trends in mass spectrometry instrumentation for proteomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Richard D.
2002-12-01
Mass spectrometry has become a primary tool for proteomics due to its capabilities for rapid and sensitive protein identification and quantitation. It is now possible to identify thousands of proteins from microgram sample quantities in a single day and to quantify relative protein abundances. However, the needs for increased capabilities for proteome measurements are immense and are now driving both new strategies and instrument advances. These developments include those based on integration with multi-dimensional liquid separations and high accuracy mass measurements, and promise more than order of magnitude improvements in sensitivity, dynamic range, and throughput for proteomic analyses in themore » near future.« less
Mass spectrometry-based proteomics for translational research: a technical overview.
Paulo, Joao A; Kadiyala, Vivek; Banks, Peter A; Steen, Hanno; Conwell, Darwin L
2012-03-01
Mass spectrometry-based investigation of clinical samples enables the high-throughput identification of protein biomarkers. We provide an overview of mass spectrometry-based proteomic techniques that are applicable to the investigation of clinical samples. We address sample collection, protein extraction and fractionation, mass spectrometry modalities, and quantitative proteomics. Finally, we examine the limitations and further potential of such technologies. Liquid chromatography fractionation coupled with tandem mass spectrometry is well suited to handle mixtures of hundreds or thousands of proteins. Mass spectrometry-based proteome elucidation can reveal potential biomarkers and aid in the development of hypotheses for downstream investigation of the molecular mechanisms of disease.
Mass Spectrometry-Based Proteomics for Translational Research: A Technical Overview
Paulo, Joao A.; Kadiyala, Vivek; Banks, Peter A.; Steen, Hanno; Conwell, Darwin L.
2012-01-01
Mass spectrometry-based investigation of clinical samples enables the high-throughput identification of protein biomarkers. We provide an overview of mass spectrometry-based proteomic techniques that are applicable to the investigation of clinical samples. We address sample collection, protein extraction and fractionation, mass spectrometry modalities, and quantitative proteomics. Finally, we examine the limitations and further potential of such technologies. Liquid chromatography fractionation coupled with tandem mass spectrometry is well suited to handle mixtures of hundreds or thousands of proteins. Mass spectrometry-based proteome elucidation can reveal potential biomarkers and aid in the development of hypotheses for downstream investigation of the molecular mechanisms of disease. PMID:22461744
Pan, Sheng; Rush, John; Peskind, Elaine R; Galasko, Douglas; Chung, Kathryn; Quinn, Joseph; Jankovic, Joseph; Leverenz, James B; Zabetian, Cyrus; Pan, Catherine; Wang, Yan; Oh, Jung Hun; Gao, Jean; Zhang, Jianpeng; Montine, Thomas; Zhang, Jing
2008-02-01
Targeted quantitative proteomics by mass spectrometry aims to selectively detect one or a panel of peptides/proteins in a complex sample and is particularly appealing for novel biomarker verification/validation because it does not require specific antibodies. Here, we demonstrated the application of targeted quantitative proteomics in searching, identifying, and quantifying selected peptides in human cerebrospinal spinal fluid (CSF) using a matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometer (MALDI TOF/TOF)-based platform. The approach involved two major components: the use of isotopic-labeled synthetic peptides as references for targeted identification and quantification and a highly selective mass spectrometric analysis based on the unique characteristics of the MALDI instrument. The platform provides high confidence for targeted peptide detection in a complex system and can potentially be developed into a high-throughput system. Using the liquid chromatography (LC) MALDI TOF/TOF platform and the complementary identification strategy, we were able to selectively identify and quantify a panel of targeted peptides in the whole proteome of CSF without prior depletion of abundant proteins. The effectiveness and robustness of the approach associated with different sample complexity, sample preparation strategies, as well as mass spectrometric quantification were evaluated. Other issues related to chromatography separation and the feasibility for high-throughput analysis were also discussed. Finally, we applied targeted quantitative proteomics to analyze a subset of previously identified candidate markers in CSF samples of patients with Parkinson's disease (PD) at different stages and Alzheimer's disease (AD) along with normal controls.
TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics
Röst, Hannes L.; Liu, Yansheng; D’Agostino, Giuseppe; Zanella, Matteo; Navarro, Pedro; Rosenberger, George; Collins, Ben C.; Gillet, Ludovic; Testa, Giuseppe; Malmström, Lars; Aebersold, Ruedi
2016-01-01
Large scale, quantitative proteomic studies have become essential for the analysis of clinical cohorts, large perturbation experiments and systems biology studies. While next-generation mass spectrometric techniques such as SWATH-MS have substantially increased throughput and reproducibility, ensuring consistent quantification of thousands of peptide analytes across multiple LC-MS/MS runs remains a challenging and laborious manual process. To produce highly consistent and quantitatively accurate proteomics data matrices in an automated fashion, we have developed the TRIC software which utilizes fragment ion data to perform cross-run alignment, consistent peak-picking and quantification for high throughput targeted proteomics. TRIC uses a graph-based alignment strategy based on non-linear retention time correction to integrate peak elution information from all LC-MS/MS runs acquired in a study. When compared to state-of-the-art SWATH-MS data analysis, the algorithm was able to reduce the identification error by more than 3-fold at constant recall, while correcting for highly non-linear chromatographic effects. On a pulsed-SILAC experiment performed on human induced pluripotent stem (iPS) cells, TRIC was able to automatically align and quantify thousands of light and heavy isotopic peak groups and substantially increased the quantitative completeness and biological information in the data, providing insights into protein dynamics of iPS cells. Overall, this study demonstrates the importance of consistent quantification in highly challenging experimental setups, and proposes an algorithm to automate this task, constituting the last missing piece in a pipeline for automated analysis of massively parallel targeted proteomics datasets. PMID:27479329
Integrated network analysis and effective tools in plant systems biology
Fukushima, Atsushi; Kanaya, Shigehiko; Nishida, Kozo
2014-01-01
One of the ultimate goals in plant systems biology is to elucidate the genotype-phenotype relationship in plant cellular systems. Integrated network analysis that combines omics data with mathematical models has received particular attention. Here we focus on the latest cutting-edge computational advances that facilitate their combination. We highlight (1) network visualization tools, (2) pathway analyses, (3) genome-scale metabolic reconstruction, and (4) the integration of high-throughput experimental data and mathematical models. Multi-omics data that contain the genome, transcriptome, proteome, and metabolome and mathematical models are expected to integrate and expand our knowledge of complex plant metabolisms. PMID:25408696
Chen, Xiang; Velliste, Meel; Murphy, Robert F.
2010-01-01
Proteomics, the large scale identification and characterization of many or all proteins expressed in a given cell type, has become a major area of biological research. In addition to information on protein sequence, structure and expression levels, knowledge of a protein’s subcellular location is essential to a complete understanding of its functions. Currently subcellular location patterns are routinely determined by visual inspection of fluorescence microscope images. We review here research aimed at creating systems for automated, systematic determination of location. These employ numerical feature extraction from images, feature reduction to identify the most useful features, and various supervised learning (classification) and unsupervised learning (clustering) methods. These methods have been shown to perform significantly better than human interpretation of the same images. When coupled with technologies for tagging large numbers of proteins and high-throughput microscope systems, the computational methods reviewed here enable the new subfield of location proteomics. This subfield will make critical contributions in two related areas. First, it will provide structured, high-resolution information on location to enable Systems Biology efforts to simulate cell behavior from the gene level on up. Second, it will provide tools for Cytomics projects aimed at characterizing the behaviors of all cell types before, during and after the onset of various diseases. PMID:16752421
Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie
2016-01-01
The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.
Bordbar, Aarash; Jamshidi, Neema; Palsson, Bernhard O
2011-07-12
The development of high-throughput technologies capable of whole cell measurements of genes, proteins, and metabolites has led to the emergence of systems biology. Integrated analysis of the resulting omic data sets has proved to be hard to achieve. Metabolic network reconstructions enable complex relationships amongst molecular components to be represented formally in a biologically relevant manner while respecting physical constraints. In silico models derived from such reconstructions can then be queried or interrogated through mathematical simulations. Proteomic profiling studies of the mature human erythrocyte have shown more proteins present related to metabolic function than previously thought; however the significance and the causal consequences of these findings have not been explored. Erythrocyte proteomic data was used to reconstruct the most expansive description of erythrocyte metabolism to date, following extensive manual curation, assessment of the literature, and functional testing. The reconstruction contains 281 enzymes representing functions from glycolysis to cofactor and amino acid metabolism. Such a comprehensive view of erythrocyte metabolism implicates the erythrocyte as a potential biomarker for different diseases as well as a 'cell-based' drug-screening tool. The analysis shows that 94 erythrocyte enzymes are implicated in morbid single nucleotide polymorphisms, representing 142 pathologies. In addition, over 230 FDA-approved and experimental pharmaceuticals have enzymatic targets in the erythrocyte. The advancement of proteomic technologies and increased generation of high-throughput proteomic data have created the need for a means to analyze these data in a coherent manner. Network reconstructions provide a systematic means to integrate and analyze proteomic data in a biologically meaning manner. Analysis of the red cell proteome has revealed an unexpected level of complexity in the functional capabilities of human erythrocyte metabolism.
Foster, Joseph M; Moreno, Pablo; Fabregat, Antonio; Hermjakob, Henning; Steinbeck, Christoph; Apweiler, Rolf; Wakelam, Michael J O; Vizcaíno, Juan Antonio
2013-01-01
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.
Identification of widespread adenosine nucleotide binding in Mycobacterium tuberculosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Ortega, Corrie; Payne, Samuel H.
The annotation of protein function is almost completely performed by in silico approaches. However, computational prediction of protein function is frequently incomplete and error prone. In Mycobacterium tuberculosis (Mtb), ~25% of all genes have no predicted function and are annotated as hypothetical proteins. This lack of functional information severely limits our understanding of Mtb pathogenicity. Current tools for experimental functional annotation are limited and often do not scale to entire protein families. Here, we report a generally applicable chemical biology platform to functionally annotate bacterial proteins by combining activity-based protein profiling (ABPP) and quantitative LC-MS-based proteomics. As an example ofmore » this approach for high-throughput protein functional validation and discovery, we experimentally annotate the families of ATP-binding proteins in Mtb. Our data experimentally validate prior in silico predictions of >250 ATPases and adenosine nucleotide-binding proteins, and reveal 73 hypothetical proteins as novel ATP-binding proteins. We identify adenosine cofactor interactions with many hypothetical proteins containing a diversity of unrelated sequences, providing a new and expanded view of adenosine nucleotide binding in Mtb. Furthermore, many of these hypothetical proteins are both unique to Mycobacteria and essential for infection, suggesting specialized functions in mycobacterial physiology and pathogenicity. Thus, we provide a generally applicable approach for high throughput protein function discovery and validation, and highlight several ways in which application of activity-based proteomics data can improve the quality of functional annotations to facilitate novel biological insights.« less
Farhoud, Murtada H; Wessels, Hans J C T; Wevers, Ron A; van Engelen, Baziel G; van den Heuvel, Lambert P; Smeitink, Jan A
2005-01-01
In 2D-based comparative proteomics of scarce samples, such as limited patient material, established methods for prefractionation and subsequent use of different narrow range IPG strips to increase overall resolution are difficult to apply. Also, a high number of samples, a prerequisite for drawing meaningful conclusions when pathological and control samples are considered, will increase the associated amount of work almost exponentially. Here, we introduce a novel, effective, and economic method designed to obtain maximum 2D resolution while maintaining the high throughput necessary to perform large-scale comparative proteomics studies. The method is based on connecting different IPG strips serially head-to-tail so that a complete line of different IPG strips with sequential pH regions can be focused in the same experiment. We show that when 3 IPG strips (covering together the pH range of 3-11) are connected head-to-tail an optimal resolution is achieved along the whole pH range. Sample consumption, time required, and associated costs are reduced by almost 70%, and the workload is reduced significantly.
Respiratory Toxicity Biomarkers
The advancement in high throughput genomic, proteomic and metabolomic techniques have accelerated pace of lung biomarker discovery. A recent growth in the discovery of new lung toxicity/disease biomarkers have led to significant advances in our understanding of pathological proce...
2013-01-01
Background The goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing - the matching of peptide measurements across samples. Results We describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates. Conclusions Based on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods. PMID:24341404
Benjamin, Ashlee M; Thompson, J Will; Soderblom, Erik J; Geromanos, Scott J; Henao, Ricardo; Kraus, Virginia B; Moseley, M Arthur; Lucas, Joseph E
2013-12-16
The goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing--the matching of peptide measurements across samples. We describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates. Based on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods.
Wan, Cuihong; Liu, Jian; Fong, Vincent; Lugowski, Andrew; Stoilova, Snejana; Bethune-Waddell, Dylan; Borgeson, Blake; Havugimana, Pierre C; Marcotte, Edward M; Emili, Andrew
2013-04-09
The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2012 Elsevier B.V. All rights reserved.
Microchip-Based Single-Cell Functional Proteomics for Biomedical Applications
Lu, Yao; Yang, Liu; Wei, Wei; Shi, Qihui
2017-01-01
Cellular heterogeneity has been widely recognized but only recently have single cell tools become available that allow characterizing heterogeneity at the genomic and proteomic levels. We review the technological advances in microchip-based toolkits for single-cell functional proteomics. Each of these tools has distinct advantages and limitations, and a few have advanced toward being applied to address biological or clinical problems that fail to be addressed by traditional population-based methods. High-throughput single-cell proteomic assays generate high-dimensional data sets that contain new information and thus require developing new analytical framework to extract new biology. In this review article, we highlight a few biological and clinical applications in which the microchip-based single-cell proteomic tools provide unique advantages. The examples include resolving functional heterogeneity and dynamics of immune cells, dissecting cell-cell interaction by creating well-contolled on-chip microenvironment, capturing high-resolution snapshots of immune system functions in patients for better immunotherapy and elucidating phosphoprotein signaling networks in cancer cells for guiding effective molecularly targeted therapies. PMID:28280819
High throughput profile-profile based fold recognition for the entire human proteome.
McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T
2006-06-07
In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Overview of proteomics studies in obstructive sleep apnea
Feliciano, Amélia; Torres, Vukosava Milic; Vaz, Fátima; Carvalho, Ana Sofia; Matthiesen, Rune; Pinto, Paula; Malhotra, Atul; Bárbara, Cristina; Penque, Deborah
2015-01-01
Obstructive sleep apnea (OSA) is an underdiagnosed common public health concern causing deleterious effects on metabolic and cardiovascular health. Although much has been learned regarding the pathophysiology and consequences of OSA in the past decades, the molecular mechanisms associated with such processes remain poorly defined. The advanced high-throughput proteomics-based technologies have become a fundamental approach for identifying novel disease mediators as potential diagnostic and therapeutic targets for many diseases, including OSA. Here, we briefly review OSA pathophysiology and the technological advances in proteomics and the first results of its application to address critical issues in the OSA field. PMID:25770042
Image analysis tools and emerging algorithms for expression proteomics
English, Jane A.; Lisacek, Frederique; Morris, Jeffrey S.; Yang, Guang-Zhong; Dunn, Michael J.
2012-01-01
Since their origins in academic endeavours in the 1970s, computational analysis tools have matured into a number of established commercial packages that underpin research in expression proteomics. In this paper we describe the image analysis pipeline for the established 2-D Gel Electrophoresis (2-DE) technique of protein separation, and by first covering signal analysis for Mass Spectrometry (MS), we also explain the current image analysis workflow for the emerging high-throughput ‘shotgun’ proteomics platform of Liquid Chromatography coupled to MS (LC/MS). The bioinformatics challenges for both methods are illustrated and compared, whilst existing commercial and academic packages and their workflows are described from both a user’s and a technical perspective. Attention is given to the importance of sound statistical treatment of the resultant quantifications in the search for differential expression. Despite wide availability of proteomics software, a number of challenges have yet to be overcome regarding algorithm accuracy, objectivity and automation, generally due to deterministic spot-centric approaches that discard information early in the pipeline, propagating errors. We review recent advances in signal and image analysis algorithms in 2-DE, MS, LC/MS and Imaging MS. Particular attention is given to wavelet techniques, automated image-based alignment and differential analysis in 2-DE, Bayesian peak mixture models and functional mixed modelling in MS, and group-wise consensus alignment methods for LC/MS. PMID:21046614
Microfluidic-Mass Spectrometry Interfaces for Translational Proteomics.
Pedde, R Daniel; Li, Huiyan; Borchers, Christoph H; Akbari, Mohsen
2017-10-01
Interfacing mass spectrometry (MS) with microfluidic chips (μchip-MS) holds considerable potential to transform a clinician's toolbox, providing translatable methods for the early detection, diagnosis, monitoring, and treatment of noncommunicable diseases by streamlining and integrating laborious sample preparation workflows on high-throughput, user-friendly platforms. Overcoming the limitations of competitive immunoassays - currently the gold standard in clinical proteomics - μchip-MS can provide unprecedented access to complex proteomic assays having high sensitivity and specificity, but without the labor, costs, and complexities associated with conventional MS sample processing. This review surveys recent μchip-MS systems for clinical applications and examines their emerging role in streamlining the development and translation of MS-based proteomic assays by alleviating many of the challenges that currently inhibit widespread clinical adoption. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Proteome Studies of Filamentous Fungi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Scott E.; Panisko, Ellen A.
2011-04-20
The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less
Zhang, Lijun; Jia, Xiaofang; Jin, Jun-O; Lu, Hongzhou; Tan, Zhimi
2017-04-01
Human immunodeficiency virus-1 (HIV-1) mainly relies on host factors to complete its life cycle. Hence, it is very important to identify HIV-regulated host proteins. Proteomics is an excellent technique for this purpose because of its high throughput and sensitivity. In this review, we summarized current technological advances in proteomics, including general isobaric tags for relative and absolute quantitation (iTRAQ) and stable isotope labeling by amino acids in cell culture (SILAC), as well as subcellular proteomics and investigation of posttranslational modifications. Furthermore, we reviewed the applications of proteomics in the discovery of HIV-related diseases and HIV infection mechanisms. Proteins identified by proteomic studies might offer new avenues for the diagnosis and treatment of HIV infection and the related diseases. Copyright © 2017 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
Systems Proteomics for Translational Network Medicine
Arrell, D. Kent; Terzic, Andre
2012-01-01
Universal principles underlying network science, and their ever-increasing applications in biomedicine, underscore the unprecedented capacity of systems biology based strategies to synthesize and resolve massive high throughput generated datasets. Enabling previously unattainable comprehension of biological complexity, systems approaches have accelerated progress in elucidating disease prediction, progression, and outcome. Applied to the spectrum of states spanning health and disease, network proteomics establishes a collation, integration, and prioritization algorithm to guide mapping and decoding of proteome landscapes from large-scale raw data. Providing unparalleled deconvolution of protein lists into global interactomes, integrative systems proteomics enables objective, multi-modal interpretation at molecular, pathway, and network scales, merging individual molecular components, their plurality of interactions, and functional contributions for systems comprehension. As such, network systems approaches are increasingly exploited for objective interpretation of cardiovascular proteomics studies. Here, we highlight network systems proteomic analysis pipelines for integration and biological interpretation through protein cartography, ontological categorization, pathway and functional enrichment and complex network analysis. PMID:22896016
Zhou, Li; Wang, Kui; Li, Qifu; Nice, Edouard C; Zhang, Haiyuan; Huang, Canhua
2016-01-01
Cancer is a common disease that is a leading cause of death worldwide. Currently, early detection and novel therapeutic strategies are urgently needed for more effective management of cancer. Importantly, protein profiling using clinical proteomic strategies, with spectacular sensitivity and precision, offer excellent promise for the identification of potential biomarkers that would direct the development of targeted therapeutic anticancer drugs for precision medicine. In particular, clinical sample sources, including tumor tissues and body fluids (blood, feces, urine and saliva), have been widely investigated using modern high-throughput mass spectrometry-based proteomic approaches combined with bioinformatic analysis, to pursue the possibilities of precision medicine for targeted cancer therapy. Discussed in this review are the current advantages and limitations of clinical proteomics, the available strategies of clinical proteomics for the management of precision medicine, as well as the challenges and future perspectives of clinical proteomics-driven precision medicine for targeted cancer therapy.
Proteome studies of filamentous fungi.
Baker, Scott E; Panisko, Ellen A
2011-01-01
The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie
2016-04-05
High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.
Pressurized Pepsin Digestion in Proteomics
López-Ferrer, Daniel; Petritis, Konstantinos; Robinson, Errol W.; Hixson, Kim K.; Tian, Zhixin; Lee, Jung Hwa; Lee, Sang-Won; Tolić, Nikola; Weitz, Karl K.; Belov, Mikhail E.; Smith, Richard D.; Paša-Tolić, Ljiljana
2011-01-01
Integrated top-down bottom-up proteomics combined with on-line digestion has great potential to improve the characterization of protein isoforms in biological systems and is amendable to high throughput proteomics experiments. Bottom-up proteomics ultimately provides the peptide sequences derived from the tandem MS analyses of peptides after the proteome has been digested. Top-down proteomics conversely entails the MS analyses of intact proteins for more effective characterization of genetic variations and/or post-translational modifications. Herein, we describe recent efforts toward efficient integration of bottom-up and top-down LC-MS-based proteomics strategies. Since most proteomics separations utilize acidic conditions, we exploited the compatibility of pepsin (where the optimal digestion conditions are at low pH) for integration into bottom-up and top-down proteomics work flows. Pressure-enhanced pepsin digestions were successfully performed and characterized with several standard proteins in either an off-line mode using a Barocycler or an on-line mode using a modified high pressure LC system referred to as a fast on-line digestion system (FOLDS). FOLDS was tested using pepsin and a whole microbial proteome, and the results were compared against traditional trypsin digestions on the same platform. Additionally, FOLDS was integrated with a RePlay configuration to demonstrate an ultrarapid integrated bottom-up top-down proteomics strategy using a standard mixture of proteins and a monkey pox virus proteome. PMID:20627868
Highly Efficient Proteolysis Accelerated by Electromagnetic Waves for Peptide Mapping
Chen, Qiwen; Liu, Ting; Chen, Gang
2011-01-01
Proteomics will contribute greatly to the understanding of gene functions in the post-genomic era. In proteome research, protein digestion is a key procedure prior to mass spectrometry identification. During the past decade, a variety of electromagnetic waves have been employed to accelerate proteolysis. This review focuses on the recent advances and the key strategies of these novel proteolysis approaches for digesting and identifying proteins. The subjects covered include microwave-accelerated protein digestion, infrared-assisted proteolysis, ultraviolet-enhanced protein digestion, laser-assisted proteolysis, and future prospects. It is expected that these novel proteolysis strategies accelerated by various electromagnetic waves will become powerful tools in proteome research and will find wide applications in high throughput protein digestion and identification. PMID:22379392
Nir, Oaz; Bakal, Chris; Perrimon, Norbert; Berger, Bonnie
2010-03-01
Biological networks are highly complex systems, consisting largely of enzymes that act as molecular switches to activate/inhibit downstream targets via post-translational modification. Computational techniques have been developed to perform signaling network inference using some high-throughput data sources, such as those generated from transcriptional and proteomic studies, but comparable methods have not been developed to use high-content morphological data, which are emerging principally from large-scale RNAi screens, to these ends. Here, we describe a systematic computational framework based on a classification model for identifying genetic interactions using high-dimensional single-cell morphological data from genetic screens, apply it to RhoGAP/GTPase regulation in Drosophila, and evaluate its efficacy. Augmented by knowledge of the basic structure of RhoGAP/GTPase signaling, namely, that GAPs act directly upstream of GTPases, we apply our framework for identifying genetic interactions to predict signaling relationships between these proteins. We find that our method makes mediocre predictions using only RhoGAP single-knockdown morphological data, yet achieves vastly improved accuracy by including original data from a double-knockdown RhoGAP genetic screen, which likely reflects the redundant network structure of RhoGAP/GTPase signaling. We consider other possible methods for inference and show that our primary model outperforms the alternatives. This work demonstrates the fundamental fact that high-throughput morphological data can be used in a systematic, successful fashion to identify genetic interactions and, using additional elementary knowledge of network structure, to infer signaling relations.
He, Yongqun
2011-01-01
Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning. PMID:22919594
He, Yongqun
2012-01-01
Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning.
Draveling, C; Ren, L; Haney, P; Zeisse, D; Qoronfleh, M W
2001-07-01
The revolution in genomics and proteomics is having a profound impact on drug discovery. Today's protein scientist demands a faster, easier, more reliable way to purify proteins. A high capacity, high-throughput new technology has been developed in Perbio Sciences for affinity protein purification. This technology utilizes selected chromatography media that are dehydrated to form uniform aggregates. The SwellGel aggregates will instantly rehydrate upon addition of the protein sample, allowing purification and direct performance of multiple assays in a variety of formats. SwellGel technology has greater stability and is easier to handle than standard wet chromatography resins. The microplate format of this technology provides high-capacity, high-throughput features, recovering milligram quantities of protein suitable for high-throughput screening or biophysical/structural studies. Data will be presented applying SwellGel technology to recombinant 6x His-tagged protein and glutathione-S-transferase (GST) fusion protein purification. Copyright 2001 Academic Press.
Metabolomic technologies are increasingly being applied to study biological questions in a range of different settings from clinical through to environmental. As with other high-throughput technologies, such as those used in transcriptomics and proteomics, metabolomics continues...
Enhancing Bottom-up and Top-down Proteomic Measurements with Ion Mobility Separations
Baker, Erin Shammel; Burnum-Johnson, Kristin E.; Ibrahim, Yehia M.; ...
2015-07-03
Proteomic measurements with greater throughput, sensitivity and additional structural information enhance the in-depth characterization of complex mixtures and targeted studies with additional information and higher confidence. While liquid chromatography separation coupled with mass spectrometry (LC-MS) measurements have provided information on thousands of proteins in different sample types, the additional of another rapid separation stage providing structural information has many benefits for analyses. Technical advances in ion funnels and multiplexing have enabled ion mobility separations to be easily and effectively coupled with LC-MS proteomics to enhance the information content of measurements. Finally, herein, we report on applications illustrating increased sensitivity, throughput,more » and structural information by utilizing IMS-MS and LC-IMS-MS measurements for both bottom-up and top-down proteomics measurements.« less
Translational Research and Plasma Proteomic in Cancer.
Santini, Annamaria Chiara; Giovane, Giancarlo; Auletta, Adelaide; Di Carlo, Angelina; Fiorelli, Alfonso; Cito, Letizia; Astarita, Carlo; Giordano, Antonio; Alfano, Roberto; Feola, Antonia; Di Domenico, Marina
2016-04-01
Proteomics is a recent field of research in molecular biology that can help in the fight against cancer through the search for biomarkers that can detect this disease in the early stages of its development. Proteomic is a speedily growing technology, also thanks to the development of even more sensitive and fast mass spectrometry analysis. Although this technique is the most widespread for the discovery of new cancer biomarkers, it still suffers of a poor sensitivity and insufficient reproducibility, essentially due to the tumor heterogeneity. Common technical shortcomings include limitations in the sensitivity of detecting low abundant biomarkers and possible systematic biases in the observed data. Current research attempts are trying to develop high-resolution proteomic instrumentation for high-throughput monitoring of protein changes that occur in cancer. In this review, we describe the basic features of the proteomic tools which have proven to be useful in cancer research, showing their advantages and disadvantages. The application of these proteomic tools could provide early biomarkers detection in various cancer types and could improve the understanding the mechanisms of tumor growth and dissemination. © 2015 Wiley Periodicals, Inc.
Computer-based fluorescence quantification: a novel approach to study nucleolar biology
2011-01-01
Background Nucleoli are composed of possibly several thousand different proteins and represent the most conspicuous compartments in the nucleus; they play a crucial role in the proper execution of many cellular processes. As such, nucleoli carry out ribosome biogenesis and sequester or associate with key molecules that regulate cell cycle progression, tumorigenesis, apoptosis and the stress response. Nucleoli are dynamic compartments that are characterized by a constant flux of macromolecules. Given the complex and dynamic composition of the nucleolar proteome, it is challenging to link modifications in nucleolar composition to downstream effects. Results In this contribution, we present quantitative immunofluorescence methods that rely on computer-based image analysis. We demonstrate the effectiveness of these techniques by monitoring the dynamic association of proteins and RNA with nucleoli under different physiological conditions. Thus, the protocols described by us were employed to study stress-dependent changes in the nucleolar concentration of endogenous and GFP-tagged proteins. Furthermore, our methods were applied to measure de novo RNA synthesis that is associated with nucleoli. We show that the techniques described here can be easily combined with automated high throughput screening (HTS) platforms, making it possible to obtain large data sets and analyze many of the biological processes that are located in nucleoli. Conclusions Our protocols set the stage to analyze in a quantitative fashion the kinetics of shuttling nucleolar proteins, both at the single cell level as well as for a large number of cells. Moreover, the procedures described here are compatible with high throughput image acquisition and analysis using HTS automated platforms, thereby providing the basis to quantify nucleolar components and activities for numerous samples and experimental conditions. Together with the growing amount of information obtained for the nucleolar proteome, improvements in quantitative microscopy as they are described here can be expected to produce new insights into the complex biological functions that are orchestrated by the nucleolus. PMID:21639891
Reconstructing the regulatory circuit of cell fate determination in yeast mating response.
Shao, Bin; Yuan, Haiyu; Zhang, Rongfei; Wang, Xuan; Zhang, Shuwen; Ouyang, Qi; Hao, Nan; Luo, Chunxiong
2017-07-01
Massive technological advances enabled high-throughput measurements of proteomic changes in biological processes. However, retrieving biological insights from large-scale protein dynamics data remains a challenging task. Here we used the mating differentiation in yeast Saccharomyces cerevisiae as a model and developed integrated experimental and computational approaches to analyze the proteomic dynamics during the process of cell fate determination. When exposed to a high dose of mating pheromone, the yeast cell undergoes growth arrest and forms a shmoo-like morphology; however, at intermediate doses, chemotropic elongated growth is initialized. To understand the gene regulatory networks that control this differentiation switch, we employed a high-throughput microfluidic imaging system that allows real-time and simultaneous measurements of cell growth and protein expression. Using kinetic modeling of protein dynamics, we classified the stimulus-dependent changes in protein abundance into two sources: global changes due to physiological alterations and gene-specific changes. A quantitative framework was proposed to decouple gene-specific regulatory modes from the growth-dependent global modulation of protein abundance. Based on the temporal patterns of gene-specific regulation, we established the network architectures underlying distinct cell fates using a reverse engineering method and uncovered the dose-dependent rewiring of gene regulatory network during mating differentiation. Furthermore, our results suggested a potential crosstalk between the pheromone response pathway and the target of rapamycin (TOR)-regulated ribosomal biogenesis pathway, which might underlie a cell differentiation switch in yeast mating response. In summary, our modeling approach addresses the distinct impacts of the global and gene-specific regulation on the control of protein dynamics and provides new insights into the mechanisms of cell fate determination. We anticipate that our integrated experimental and modeling strategies could be widely applicable to other biological systems.
Biochemical Markers of Brain Injury: An Integrated Proteomics-Based Approach
2006-02-01
Anthony J Williams, X-C May Lu, Renwu Chen, Zhilin Liao, Rebeca Connors, Kevin K Wang, Ron L Hayes, Frank C Tortella, Jitendra R Dave. High throughput... YANG , A., et al. (2002). Evalu- ation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell...apoptosis. J. Biol. Chem. 279, 1030–1039. Kuida K., Zheng T. S., Na S., Kuan C., Yang D., Karasuyama H., Rakic P. and Flavell R. A. (1996) Decreased apoptosis
Yu, Kebing; Salomon, Arthur R
2009-12-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.
A novel spectral library workflow to enhance protein identifications.
Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei
2013-04-09
The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2013 Elsevier B.V. All rights reserved.
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Liu, Ming-Qi; Zeng, Wen-Feng; Fang, Pan; Cao, Wei-Qian; Liu, Chao; Yan, Guo-Quan; Zhang, Yang; Peng, Chao; Wu, Jian-Qiang; Zhang, Xiao-Jin; Tu, Hui-Jun; Chi, Hao; Sun, Rui-Xiang; Cao, Yong; Dong, Meng-Qiu; Jiang, Bi-Yun; Huang, Jiang-Ming; Shen, Hua-Li; Wong, Catherine C L; He, Si-Min; Yang, Peng-Yuan
2017-09-05
The precise and large-scale identification of intact glycopeptides is a critical step in glycoproteomics. Owing to the complexity of glycosylation, the current overall throughput, data quality and accessibility of intact glycopeptide identification lack behind those in routine proteomic analyses. Here, we propose a workflow for the precise high-throughput identification of intact N-glycopeptides at the proteome scale using stepped-energy fragmentation and a dedicated search engine. pGlyco 2.0 conducts comprehensive quality control including false discovery rate evaluation at all three levels of matches to glycans, peptides and glycopeptides, improving the current level of accuracy of intact glycopeptide identification. The N-glycoproteome of samples metabolically labeled with 15 N/ 13 C were analyzed quantitatively and utilized to validate the glycopeptide identification, which could be used as a novel benchmark pipeline to compare different search engines. Finally, we report a large-scale glycoproteome dataset consisting of 10,009 distinct site-specific N-glycans on 1988 glycosylation sites from 955 glycoproteins in five mouse tissues.Protein glycosylation is a heterogeneous post-translational modification that generates greater proteomic diversity that is difficult to analyze. Here the authors describe pGlyco 2.0, a workflow for the precise one step identification of intact N-glycopeptides at the proteome scale.
Ndimba, Bongani Kaiser; Ndimba, Roya Janeen; Johnson, T Sudhakar; Waditee-Sirisattha, Rungaroon; Baba, Masato; Sirisattha, Sophon; Shiraiwa, Yoshihiro; Agrawal, Ganesh Kumar; Rakwal, Randeep
2013-11-20
Sustainable energy is the need of the 21st century, not because of the numerous environmental and political reasons but because it is necessary to human civilization's energy future. Sustainable energy is loosely grouped into renewable energy, energy conservation, and sustainable transport disciplines. In this review, we deal with the renewable energy aspect focusing on the biomass from bioenergy crops to microalgae to produce biofuels to the utilization of high-throughput omics technologies, in particular proteomics in advancing our understanding and increasing biofuel production. We look at biofuel production by plant- and algal-based sources, and the role proteomics has played therein. This article is part of a Special Issue entitled: Translational Plant Proteomics. Copyright © 2013 Elsevier B.V. All rights reserved.
Automation, parallelism, and robotics for proteomics.
Alterovitz, Gil; Liu, Jonathan; Chow, Jijun; Ramoni, Marco F
2006-07-01
The speed of the human genome project (Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C. et al., Nature 2001, 409, 860-921) was made possible, in part, by developments in automation of sequencing technologies. Before these technologies, sequencing was a laborious, expensive, and personnel-intensive task. Similarly, automation and robotics are changing the field of proteomics today. Proteomics is defined as the effort to understand and characterize proteins in the categories of structure, function and interaction (Englbrecht, C. C., Facius, A., Comb. Chem. High Throughput Screen. 2005, 8, 705-715). As such, this field nicely lends itself to automation technologies since these methods often require large economies of scale in order to achieve cost and time-saving benefits. This article describes some of the technologies and methods being applied in proteomics in order to facilitate automation within the field as well as in linking proteomics-based information with other related research areas.
Proteomics of Plant Pathogenic Fungi
González-Fernández, Raquel; Prats, Elena; Jorrín-Novo, Jesús V.
2010-01-01
Plant pathogenic fungi cause important yield losses in crops. In order to develop efficient and environmental friendly crop protection strategies, molecular studies of the fungal biological cycle, virulence factors, and interaction with its host are necessary. For that reason, several approaches have been performed using both classical genetic, cell biology, and biochemistry and the modern, holistic, and high-throughput, omic techniques. This work briefly overviews the tools available for studying Plant Pathogenic Fungi and is amply focused on MS-based Proteomics analysis, based on original papers published up to December 2009. At a methodological level, different steps in a proteomic workflow experiment are discussed. Separate sections are devoted to fungal descriptive (intracellular, subcellular, extracellular) and differential expression proteomics and interactomics. From the work published we can conclude that Proteomics, in combination with other techniques, constitutes a powerful tool for providing important information about pathogenicity and virulence factors, thus opening up new possibilities for crop disease diagnosis and crop protection. PMID:20589070
Proteomics of plant pathogenic fungi.
González-Fernández, Raquel; Prats, Elena; Jorrín-Novo, Jesús V
2010-01-01
Plant pathogenic fungi cause important yield losses in crops. In order to develop efficient and environmental friendly crop protection strategies, molecular studies of the fungal biological cycle, virulence factors, and interaction with its host are necessary. For that reason, several approaches have been performed using both classical genetic, cell biology, and biochemistry and the modern, holistic, and high-throughput, omic techniques. This work briefly overviews the tools available for studying Plant Pathogenic Fungi and is amply focused on MS-based Proteomics analysis, based on original papers published up to December 2009. At a methodological level, different steps in a proteomic workflow experiment are discussed. Separate sections are devoted to fungal descriptive (intracellular, subcellular, extracellular) and differential expression proteomics and interactomics. From the work published we can conclude that Proteomics, in combination with other techniques, constitutes a powerful tool for providing important information about pathogenicity and virulence factors, thus opening up new possibilities for crop disease diagnosis and crop protection.
Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa
2006-04-01
SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.
Nucleic Acids for Ultra-Sensitive Protein Detection
Janssen, Kris P. F.; Knez, Karel; Spasic, Dragana; Lammertyn, Jeroen
2013-01-01
Major advancements in molecular biology and clinical diagnostics cannot be brought about strictly through the use of genomics based methods. Improved methods for protein detection and proteomic screening are an absolute necessity to complement to wealth of information offered by novel, high-throughput sequencing technologies. Only then will it be possible to advance insights into clinical processes and to characterize the importance of specific protein biomarkers for disease detection or the realization of “personalized medicine”. Currently however, large-scale proteomic information is still not as easily obtained as its genomic counterpart, mainly because traditional antibody-based technologies struggle to meet the stringent sensitivity and throughput requirements that are required whereas mass-spectrometry based methods might be burdened by significant costs involved. However, recent years have seen the development of new biodetection strategies linking nucleic acids with existing antibody technology or replacing antibodies with oligonucleotide recognition elements altogether. These advancements have unlocked many new strategies to lower detection limits and dramatically increase throughput of protein detection assays. In this review, an overview of these new strategies will be given. PMID:23337338
A set of ligation-independent in vitro translation vectors for eukaryotic protein production.
Bardóczy, Viola; Géczi, Viktória; Sawasaki, Tatsuya; Endo, Yaeta; Mészáros, Tamás
2008-03-27
The last decade has brought the renaissance of protein studies and accelerated the development of high-throughput methods in all aspects of proteomics. Presently, most protein synthesis systems exploit the capacity of living cells to translate proteins, but their application is limited by several factors. A more flexible alternative protein production method is the cell-free in vitro protein translation. Currently available in vitro translation systems are suitable for high-throughput robotic protein production, fulfilling the requirements of proteomics studies. Wheat germ extract based in vitro translation system is likely the most promising method, since numerous eukaryotic proteins can be cost-efficiently synthesized in their native folded form. Although currently available vectors for wheat embryo in vitro translation systems ensure high productivity, they do not meet the requirements of state-of-the-art proteomics. Target genes have to be inserted using restriction endonucleases and the plasmids do not encode cleavable affinity purification tags. We designed four ligation independent cloning (LIC) vectors for wheat germ extract based in vitro protein translation. In these constructs, the RNA transcription is driven by T7 or SP6 phage polymerase and two TEV protease cleavable affinity tags can be added to aid protein purification. To evaluate our improved vectors, a plant mitogen activated protein kinase was cloned in all four constructs. Purification of this eukaryotic protein kinase demonstrated that all constructs functioned as intended: insertion of PCR fragment by LIC worked efficiently, affinity purification of translated proteins by GST-Sepharose or MagneHis particles resulted in high purity kinase, and the affinity tags could efficiently be removed under different reaction conditions. Furthermore, high in vitro kinase activity testified of proper folding of the purified protein. Four newly designed in vitro translation vectors have been constructed which allow fast and parallel cloning and protein purification, thus representing useful molecular tools for high-throughput production of eukaryotic proteins.
Wheat proteomics: proteome modulation and abiotic stress acclimation
Komatsu, Setsuko; Kamal, Abu H. M.; Hossain, Zahed
2014-01-01
Cellular mechanisms of stress sensing and signaling represent the initial plant responses to adverse conditions. The development of high-throughput “Omics” techniques has initiated a new era of the study of plant molecular strategies for adapting to environmental changes. However, the elucidation of stress adaptation mechanisms in plants requires the accurate isolation and characterization of stress-responsive proteins. Because the functional part of the genome, namely the proteins and their post-translational modifications, are critical for plant stress responses, proteomic studies provide comprehensive information about the fine-tuning of cellular pathways that primarily involved in stress mitigation. This review summarizes the major proteomic findings related to alterations in the wheat proteomic profile in response to abiotic stresses. Moreover, the strengths and weaknesses of different sample preparation techniques, including subcellular protein extraction protocols, are discussed in detail. The continued development of proteomic approaches in combination with rapidly evolving bioinformatics tools and interactive databases will facilitate understanding of the plant mechanisms underlying stress tolerance. PMID:25538718
Picotti, Paola; Clement-Ziza, Mathieu; Lam, Henry; Campbell, David S.; Schmidt, Alexander; Deutsch, Eric W.; Röst, Hannes; Sun, Zhi; Rinner, Oliver; Reiter, Lukas; Shen, Qin; Michaelson, Jacob J.; Frei, Andreas; Alberti, Simon; Kusebauch, Ulrike; Wollscheid, Bernd; Moritz, Robert; Beyer, Andreas; Aebersold, Ruedi
2013-01-01
Complete reference maps or datasets, like the genomic map of an organism, are highly beneficial tools for biological and biomedical research. Attempts to generate such reference datasets for a proteome so far failed to reach complete proteome coverage, with saturation apparent at approximately two thirds of the proteomes tested, even for the most thoroughly characterized proteomes. Here, we used a strategy based on high-throughput peptide synthesis and mass spectrometry to generate a close to complete reference map (97% of the genome-predicted proteins) of the S. cerevisiae proteome. We generated two versions of this mass spectrometric map one supporting discovery- (shotgun) and the other hypothesis-driven (targeted) proteomic measurements. The two versions of the map, therefore, constitute a complete set of proteomic assays to support most studies performed with contemporary proteomic technologies. The reference libraries can be browsed via a web-based repository and associated navigation tools. To demonstrate the utility of the reference libraries we applied them to a protein quantitative trait locus (pQTL) analysis, which requires measurement of the same peptides over a large number of samples with high precision. Protein measurements over a set of 78 S. cerevisiae strains revealed a complex relationship between independent genetic loci, impacting on the levels of related proteins. Our results suggest that selective pressure favors the acquisition of sets of polymorphisms that maintain the stoichiometry of protein complexes and pathways. PMID:23334424
NMR in the SPINE Structural Proteomics project.
Ab, E; Atkinson, A R; Banci, L; Bertini, I; Ciofi-Baffoni, S; Brunner, K; Diercks, T; Dötsch, V; Engelke, F; Folkers, G E; Griesinger, C; Gronwald, W; Günther, U; Habeck, M; de Jong, R N; Kalbitzer, H R; Kieffer, B; Leeflang, B R; Loss, S; Luchinat, C; Marquardsen, T; Moskau, D; Neidig, K P; Nilges, M; Piccioli, M; Pierattelli, R; Rieping, W; Schippmann, T; Schwalbe, H; Travé, G; Trenner, J; Wöhnert, J; Zweckstetter, M; Kaptein, R
2006-10-01
This paper describes the developments, role and contributions of the NMR spectroscopy groups in the Structural Proteomics In Europe (SPINE) consortium. Focusing on the development of high-throughput (HTP) pipelines for NMR structure determinations of proteins, all aspects from sample preparation, data acquisition, data processing, data analysis to structure determination have been improved with respect to sensitivity, automation, speed, robustness and validation. Specific highlights are protonless (13)C-direct detection methods and inferential structure determinations (ISD). In addition to technological improvements, these methods have been applied to deliver over 60 NMR structures of proteins, among which are five that failed to crystallize. The inclusion of NMR spectroscopy in structural proteomics pipelines improves the success rate for protein structure determinations.
Chipster: user-friendly analysis software for microarray and other high-throughput data.
Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I
2011-10-14
The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
Chipster: user-friendly analysis software for microarray and other high-throughput data
2011-01-01
Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641
TimeXNet Web: Identifying cellular response networks from diverse omics time-course data.
Tan, Phit Ling; López, Yosvany; Nakai, Kenta; Patil, Ashwini
2018-05-14
Condition-specific time-course omics profiles are frequently used to study cellular response to stimuli and identify associated signaling pathways. However, few online tools allow users to analyze multiple types of high-throughput time-course data. TimeXNet Web is a web server that extracts a time-dependent gene/protein response network from time-course transcriptomic, proteomic or phospho-proteomic data, and an input interaction network. It classifies the given genes/proteins into time-dependent groups based on the time of their highest activity and identifies the most probable paths connecting genes/proteins in consecutive groups. The response sub-network is enriched in activated genes/proteins and contains novel regulators that do not show any observable change in the input data. Users can view the resultant response network and analyze it for functional enrichment. TimeXNet Web supports the analysis of high-throughput data from multiple species by providing high quality, weighted protein-protein interaction networks for 12 model organisms. http://txnet.hgc.jp/. ashwini@hgc.jp. Supplementary data are available at Bioinformatics online.
ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.
May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk
2009-05-04
The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
Yu, Kebing; Salomon, Arthur R.
2010-01-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895
Martínez-Bartolomé, Salvador; Medina-Aunon, J Alberto; López-García, Miguel Ángel; González-Tejedo, Carmen; Prieto, Gorka; Navajas, Rosana; Salazar-Donate, Emilio; Fernández-Costa, Carolina; Yates, John R; Albar, Juan Pablo
2018-04-06
Mass-spectrometry-based proteomics has evolved into a high-throughput technology in which numerous large-scale data sets are generated from diverse analytical platforms. Furthermore, several scientific journals and funding agencies have emphasized the storage of proteomics data in public repositories to facilitate its evaluation, inspection, and reanalysis. (1) As a consequence, public proteomics data repositories are growing rapidly. However, tools are needed to integrate multiple proteomics data sets to compare different experimental features or to perform quality control analysis. Here, we present a new Java stand-alone tool, Proteomics Assay COMparator (PACOM), that is able to import, combine, and simultaneously compare numerous proteomics experiments to check the integrity of the proteomic data as well as verify data quality. With PACOM, the user can detect source of errors that may have been introduced in any step of a proteomics workflow and that influence the final results. Data sets can be easily compared and integrated, and data quality and reproducibility can be visually assessed through a rich set of graphical representations of proteomics data features as well as a wide variety of data filters. Its flexibility and easy-to-use interface make PACOM a unique tool for daily use in a proteomics laboratory. PACOM is available at https://github.com/smdb21/pacom .
[Techniques for rapid production of monoclonal antibodies for use with antibody technology].
Kamada, Haruhiko
2012-01-01
A monoclonal antibody (Mab), due to its specific binding ability to a target protein, can potentially be one of the most useful tools for the functional analysis of proteins in recent proteomics-based research. However, the production of Mab is a very time-consuming and laborious process (i.e., preparation of recombinant antigens, immunization of animals, preparation of hybridomas), making it the rate-limiting step in using Mabs in high-throughput proteomics research, which heavily relies on comprehensive and rapid methods. Therefore, there is a great demand for new methods to efficiently generate Mabs against a group of proteins identified by proteome analysis. Here, we describe a useful method called "Antibody proteomic technique" for the rapid generations of Mabs to pharmaceutical target, which were identified by proteomic analyses of disease samples (ex. tumor tissue, etc.). We also introduce another method to find profitable targets on vasculature, which is called "Vascular proteomic technique". Our results suggest that this method for the rapid generation of Mabs to proteins may be very useful in proteomics-based research as well as in clinical applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolker, Eugene
Our project focused primarily on analysis of different types of data produced by global high-throughput technologies, data integration of gene annotation, and gene and protein expression information, as well as on getting a better functional annotation of Shewanella genes. Specifically, four of our numerous major activities and achievements include the development of: statistical models for identification and expression proteomics, superior to currently available approaches (including our own earlier ones); approaches to improve gene annotations on the whole-organism scale; standards for annotation, transcriptomics and proteomics approaches; and generalized approaches for data integration of gene annotation, gene and protein expression information.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.
2012-04-25
Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.
Advanced proteomic liquid chromatography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Fang; Smith, Richard D.; Shen, Yufeng
2012-10-26
Liquid chromatography coupled with mass spectrometry is the predominant platform used to analyze proteomics samples consisting of large numbers of proteins and their proteolytic products (e.g., truncated polypeptides) and spanning a wide range of relative concentrations. This review provides an overview of advanced capillary liquid chromatography techniques and methodologies that greatly improve separation resolving power and proteomics analysis coverage, sensitivity, and throughput.
Fusarium graminearum and Its Interactions with Cereal Heads: Studies in the Proteomics Era
Yang, Fen; Jacobsen, Susanne; Jørgensen, Hans J. L.; Collinge, David B.; Svensson, Birte; Finnie, Christine
2013-01-01
The ascomycete fungal pathogen Fusarium graminearum (teleomorph stage: Gibberella zeae) is the causal agent of Fusarium head blight in wheat and barley. This disease leads to significant losses of crop yield, and especially quality through the contamination by diverse fungal mycotoxins, which constitute a significant threat to the health of humans and animals. In recent years, high-throughput proteomics, aiming at identifying a broad spectrum of proteins with a potential role in the pathogenicity and host resistance, has become a very useful tool in plant-fungus interaction research. In this review, we describe the progress in proteomics applications toward a better understanding of F. graminearum pathogenesis, virulence, and host defense mechanisms. The contribution of proteomics to the development of crop protection strategies against this pathogen is also discussed briefly. PMID:23450732
A practical data processing workflow for multi-OMICS projects.
Kohl, Michael; Megger, Dominik A; Trippler, Martin; Meckel, Hagen; Ahrens, Maike; Bracht, Thilo; Weber, Frank; Hoffmann, Andreas-Claudius; Baba, Hideo A; Sitek, Barbara; Schlaak, Jörg F; Meyer, Helmut E; Stephan, Christian; Eisenacher, Martin
2014-01-01
Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
Kumarathasan, P; Vincent, R; Das, D; Mohottalage, S; Blais, E; Blank, K; Karthikeyan, S; Vuong, N Q; Arbuckle, T E; Fraser, W D
2014-04-04
There are reports linking maternal nutritional status, smoking and environmental chemical exposures to adverse pregnancy outcomes. However, biological bases for association between some of these factors and birth outcomes are yet to be established. The objective of this preliminary work is to test the capability of a new high-throughput shotgun plasma proteomic screening in identifying maternal changes relevant to pregnancy outcome. A subset of third trimester plasma samples (N=12) associated with normal and low-birth weight infants were fractionated, tryptic-digested and analyzed for global proteomic changes using a MALDI-TOF-TOF-MS methodology. Mass spectral data were mined for candidate biomarkers using bioinformatic and statistical tools. Maternal plasma profiles of cytokines (e.g. IL8, TNF-α), chemokines (e.g. MCP-1) and cardiovascular endpoints (e.g. ET-1, MMP-9) were analyzed by a targeted approach using multiplex protein array and HPLC-Fluorescence methods. Target and global plasma proteomic markers were used to identify protein interaction networks and maternal biological pathways relevant to low infant birth weight. Our results exhibited the potential to discriminate specific maternal physiologies relevant to risk of adverse birth outcomes. This proteomic approach can be valuable in understanding the impacts of maternal factors such as environmental contaminant exposures and nutrition on birth outcomes in future work. We demonstrate here the fitness of mass spectrometry-based shot-gun proteomics for surveillance of biological changes in mothers, and for adverse pathway analysis in combination with target biomarker information. This approach has potential for enabling early detection of mothers at risk for low infant birth weight and preterm birth, and thus early intervention for mitigation and prevention of adverse pregnancy outcomes. This article is part of a Special Issue entitled: Can Proteomics Fill the Gap Between Genomics and Phenotypes? Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Clutterbuck, Abigail L.; Smith, Julia R.; Allaway, David; Harris, Pat; Liddell, Susan; Mobasheri, Ali
2011-01-01
This study employed a targeted high-throughput proteomic approach to identify the major proteins present in the secretome of articular cartilage. Explants from equine metacarpophalangeal joints were incubated alone or with interleukin-1beta (IL-1β, 10 ng/ml), with or without carprofen, a non-steroidal anti-inflammatory drug, for six days. After tryptic digestion of culture medium supernatants, resulting peptides were separated by HPLC and detected in a Bruker amaZon ion trap instrument. The five most abundant peptides in each MS scan were fragmented and the fragmentation patterns compared to mammalian entries in the Swiss-Prot database, using the Mascot search engine. Tryptic peptides originating from aggrecan core protein, cartilage oligomeric matrix protein (COMP), fibronectin, fibromodulin, thrombospondin-1 (TSP-1), clusterin (CLU), cartilage intermediate layer protein-1 (CILP-1), chondroadherin (CHAD) and matrix metalloproteinases MMP-1 and MMP-3 were detected. Quantitative western blotting confirmed the presence of CILP-1, CLU, MMP-1, MMP-3 and TSP-1. Treatment with IL-1β increased MMP-1, MMP-3 and TSP-1 and decreased the CLU precursor but did not affect CILP-1 and CLU levels. Many of the proteins identified have well-established extracellular matrix functions and are involved in early repair/stress responses in cartilage. This high throughput approach may be used to study the changes that occur in the early stages of osteoarthritis. PMID:21354348
Tao, Dingyin; Zhang, Lihua; Shan, Yichu; Liang, Zhen; Zhang, Yukui
2011-01-01
High-performance liquid chromatography-electrospray ionization tandem mass spectrometry (HPLC-ESI-MS-MS) is regarded as one of the most powerful techniques for separation and identification of proteins. Recently, much effort has been made to improve the separation capacity, detection sensitivity, and analysis throughput of micro- and nano-HPLC, by increasing column length, reducing column internal diameter, and using integrated techniques. Development of HPLC columns has also been rapid, as a result of the use of submicrometer packing materials and monolithic columns. All these innovations result in clearly improved performance of micro- and nano-HPLC for proteome research.
The application of proteomics in different aspects of hepatocellular carcinoma research.
Xing, Xiaohua; Liang, Dong; Huang, Yao; Zeng, Yongyi; Han, Xiao; Liu, Xiaolong; Liu, Jingfeng
2016-08-11
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors, which is causing the second leading cancer-related death worldwide. With the significant advances of high-throughput protein analysis techniques, the proteomics offered an extremely useful and versatile analytical platform for biomedical researches. In recent years, different proteomic strategies have been widely applied in the various aspects of HCC studies, ranging from screening the early diagnostic and prognostic biomarkers to in-depth investigating the underlying molecular mechanisms. In this review, we would like to systematically summarize the current applications of proteomics in hepatocellular carcinoma study, and discuss the challenges of applying proteomics in study clinical samples, as well as discuss the possible application of proteomics in precision medicine. In this review, we have systematically summarized the current applications of proteomics in hepatocellular carcinoma study, ranging from screening biomarkers to in-depth investigating the underlying molecular mechanisms. In addition, we have discussed the challenges of applying proteomics in study clinical samples, as well as the possible applications of proteomics in precision medicine. We believe that this review would help readers to be better familiar with the recent progresses of clinical proteomics, especially in the field of hepatocellular carcinoma research. Copyright © 2016 Elsevier B.V. All rights reserved.
Advanced proteomic liquid chromatography
Xie, Fang; Smith, Richard D.; Shen, Yufeng
2012-01-01
Liquid chromatography coupled with mass spectrometry is the predominant platform used to analyze proteomics samples consisting of large numbers of proteins and their proteolytic products (e.g., truncated polypeptides) and spanning a wide range of relative concentrations. This review provides an overview of advanced capillary liquid chromatography techniques and methodologies that greatly improve separation resolving power and proteomics analysis coverage, sensitivity, and throughput. PMID:22840822
Selected reaction monitoring mass spectrometry: a methodology overview.
Ebhardt, H Alexander
2014-01-01
Moving past the discovery phase of proteomics, the term targeted proteomics combines multiple approaches investigating a certain set of proteins in more detail. One such targeted proteomics approach is the combination of liquid chromatography and selected or multiple reaction monitoring mass spectrometry (SRM, MRM). SRM-MS requires prior knowledge of the fragmentation pattern of peptides, as the presence of the analyte in a sample is determined by measuring the m/z values of predefined precursor and fragment ions. Using scheduled SRM-MS, many analytes can robustly be monitored allowing for high-throughput sample analysis of the same set of proteins over many conditions. In this chapter, fundaments of SRM-MS are explained as well as an optimized SRM pipeline from assay generation to data analyzed.
Emerging proteomics biomarkers and prostate cancer burden in Africa
Adeola, Henry A.; Blackburn, Jonathan M.; Rebbeck, Timothy R.; Zerbini, Luiz F.
2017-01-01
Various biomarkers have emerged via high throughput omics-based approaches for use in diagnosis, treatment, and monitoring of prostate cancer. Many of these have yet to be demonstrated as having value in routine clinical practice. Moreover, there is a dearth of information on validation of these emerging prostate biomarkers within African cohorts, despite the huge burden and aggressiveness of prostate cancer in men of African descent. This review focusses of the global landmark achievements in prostate cancer proteomics biomarker discovery and the potential for clinical implementation of these biomarkers in Africa. Biomarker validation processes at the preclinical, translational and clinical research level are discussed here, as are the challenges and prospects for the evaluation and use of novel proteomic prostate cancer biomarkers. PMID:28388542
Emerging proteomics biomarkers and prostate cancer burden in Africa.
Adeola, Henry A; Blackburn, Jonathan M; Rebbeck, Timothy R; Zerbini, Luiz F
2017-06-06
Various biomarkers have emerged via high throughput omics-based approaches for use in diagnosis, treatment, and monitoring of prostate cancer. Many of these have yet to be demonstrated as having value in routine clinical practice. Moreover, there is a dearth of information on validation of these emerging prostate biomarkers within African cohorts, despite the huge burden and aggressiveness of prostate cancer in men of African descent. This review focusses of the global landmark achievements in prostate cancer proteomics biomarker discovery and the potential for clinical implementation of these biomarkers in Africa. Biomarker validation processes at the preclinical, translational and clinical research level are discussed here, as are the challenges and prospects for the evaluation and use of novel proteomic prostate cancer biomarkers.
Biomedical data integration in computational drug design and bioinformatics.
Seoane, Jose A; Aguiar-Pulido, Vanessa; Munteanu, Cristian R; Rivero, Daniel; Rabunal, Juan R; Dorado, Julian; Pazos, Alejandro
2013-03-01
In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different, heterogeneous and distributed biomedical data sources. Data integration solutions can be very useful not only in the context of drug design, but also in biomedical information retrieval, clinical diagnosis, system biology, etc. In this review, we analyze the most common approaches to biomedical data integration, such as federated databases, data warehousing, multi-agent systems and semantic technology, as well as the solutions developed using these approaches in the past few years.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.
Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi
2007-10-04
In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
A New Mass Spectrometry-compatible Degradable Surfactant for Tissue Proteomics
Chang, Ying-Hua; Gregorich, Zachery R.; Chen, Albert J.; Hwang, Leekyoung; Guner, Huseyin; Yu, Deyang; Zhang, Jianyi; Ge, Ying
2015-01-01
Tissue proteomics is increasingly recognized for its role in biomarker discovery and disease mechanism investigation. However, protein solubility remains a significant challenge in mass spectrometry (MS)-based tissue proteomics. Conventional surfactants such as sodium dodecyl sulfate (SDS), the preferred surfactant for protein solubilization, are not compatible with MS. Herein, we have screened a library of surfactant-like compounds and discovered an MS-compatible degradable surfactant (MaSDeS) for tissue proteomics that solubilizes all categories of proteins with performance comparable to SDS. The use of MaSDeS in the tissue extraction significantly improves the total number of protein identifications from commonly used tissues, including tissue from the heart, liver, and lung. Notably, MaSDeS significantly enriches membrane proteins, which are often under-represented in proteomics studies. The acid degradable nature of MaSDeS makes it amenable for high-throughput mass spectrometry-based proteomics. In addition, the thermostability of MaSDeS allows for its use in experiments requiring high temperature to facilitate protein extraction and solubilization. Furthermore, we have shown that MaSDeS outperforms the other MS-compatible surfactants in terms of overall protein solubility and the total number of identified proteins in tissue proteomics. Thus, the use of MaSDeS will greatly advance tissue proteomics and realize its potential in basic biomedical and clinical research. MaSDeS could be utilized in a variety of proteomics studies as well as general biochemical and biological experiments that employ surfactants for protein solubilization. PMID:25589168
Huang, Rongrong; Chen, Zhongsi; He, Lei; He, Nongyue; Xi, Zhijiang; Li, Zhiyang; Deng, Yan; Zeng, Xin
2017-01-01
There is a critical need for the discovery of novel biomarkers for early detection and targeted therapy of cancer, a major cause of deaths worldwide. In this respect, proteomic technologies, such as mass spectrometry (MS), enable the identification of pathologically significant proteins in various types of samples. MS is capable of high-throughput profiling of complex biological samples including blood, tissues, urine, milk, and cells. MS-assisted proteomics has contributed to the development of cancer biomarkers that may form the foundation for new clinical tests. It can also aid in elucidating the molecular mechanisms underlying cancer. In this review, we discuss MS principles and instrumentation as well as approaches in MS-based proteomics, which have been employed in the development of potential biomarkers. Furthermore, the challenges in validation of MS biomarkers for their use in clinical practice are also reviewed. PMID:28912895
Proteomic Profiling of Mitochondrial Enzymes during Skeletal Muscle Aging.
Staunton, Lisa; O'Connell, Kathleen; Ohlendieck, Kay
2011-03-07
Mitochondria are of central importance for energy generation in skeletal muscles. Expression changes or functional alterations in mitochondrial enzymes play a key role during myogenesis, fibre maturation, and various neuromuscular pathologies, as well as natural fibre aging. Mass spectrometry-based proteomics suggests itself as a convenient large-scale and high-throughput approach to catalogue the mitochondrial protein complement and determine global changes during health and disease. This paper gives a brief overview of the relatively new field of mitochondrial proteomics and discusses the findings from recent proteomic surveys of mitochondrial elements in aged skeletal muscles. Changes in the abundance, biochemical activity, subcellular localization, and/or posttranslational modifications in key mitochondrial enzymes might be useful as novel biomarkers of aging. In the long term, this may advance diagnostic procedures, improve the monitoring of disease progression, help in the testing of side effects due to new drug regimes, and enhance our molecular understanding of age-related muscle degeneration.
Maize-Pathogen Interactions: An Ongoing Combat from a Proteomics Perspective.
Pechanova, Olga; Pechan, Tibor
2015-11-30
Maize (Zea mays L.) is a host to numerous pathogenic species that impose serious diseases to its ear and foliage, negatively affecting the yield and the quality of the maize crop. A considerable amount of research has been carried out to elucidate mechanisms of maize-pathogen interactions with a major goal to identify defense-associated proteins. In this review, we summarize interactions of maize with its agriculturally important pathogens that were assessed at the proteome level. Employing differential analyses, such as the comparison of pathogen-resistant and susceptible maize varieties, as well as changes in maize proteomes after pathogen challenge, numerous proteins were identified as possible candidates in maize resistance. We describe findings of various research groups that used mainly mass spectrometry-based, high through-put proteomic tools to investigate maize interactions with fungal pathogens Aspergillus flavus, Fusarium spp., and Curvularia lunata, and viral agents Rice Black-streaked Dwarf Virus and Sugarcane Mosaic Virus.
Maize-Pathogen Interactions: An Ongoing Combat from a Proteomics Perspective
Pechanova, Olga; Pechan, Tibor
2015-01-01
Maize (Zea mays L.) is a host to numerous pathogenic species that impose serious diseases to its ear and foliage, negatively affecting the yield and the quality of the maize crop. A considerable amount of research has been carried out to elucidate mechanisms of maize-pathogen interactions with a major goal to identify defense-associated proteins. In this review, we summarize interactions of maize with its agriculturally important pathogens that were assessed at the proteome level. Employing differential analyses, such as the comparison of pathogen-resistant and susceptible maize varieties, as well as changes in maize proteomes after pathogen challenge, numerous proteins were identified as possible candidates in maize resistance. We describe findings of various research groups that used mainly mass spectrometry-based, high through-put proteomic tools to investigate maize interactions with fungal pathogens Aspergillus flavus, Fusarium spp., and Curvularia lunata, and viral agents Rice Black-streaked Dwarf Virus and Sugarcane Mosaic Virus. PMID:26633370
Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min
2015-11-03
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Nicolau, Monica; Levine, Arnold J; Carlsson, Gunnar
2011-04-26
High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER(+)) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER(+) breast cancers. We denote the group as c-MYB(+) breast cancer.
The role of targeted chemical proteomics in pharmacology
Sutton, Chris W
2012-01-01
Traditionally, proteomics is the high-throughput characterization of the global complement of proteins in a biological system using cutting-edge technologies (robotics and mass spectrometry) and bioinformatics tools (Internet-based search engines and databases). As the field of proteomics has matured, a diverse range of strategies have evolved to answer specific problems. Chemical proteomics is one such direction that provides the means to enrich and detect less abundant proteins (the ‘hidden’ proteome) from complex mixtures of wide dynamic range (the ‘deep’ proteome). In pharmacology, chemical proteomics has been utilized to determine the specificity of drugs and their analogues, for anticipated known targets, only to discover other proteins that bind and could account for side effects observed in preclinical and clinical trials. As a consequence, chemical proteomics provides a valuable accessory in refinement of second- and third-generation drug design for treatment of many diseases. However, determining definitive affinity capture of proteins by a drug immobilized on soft gel chromatography matrices has highlighted some of the challenges that remain to be addressed. Examples of the different strategies that have emerged using well-established drugs against pharmaceutically important enzymes, such as protein kinases, metalloproteases, PDEs, cytochrome P450s, etc., indicate the potential opportunity to employ chemical proteomics as an early-stage screening approach in the identification of new targets. PMID:22074351
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Yu, Xiaobo; Bian, Xiaofang; Throop, Andrea; Song, Lusheng; Moral, Lerys Del; Park, Jin; Seiler, Catherine; Fiacco, Michael; Steel, Jason; Hunter, Preston; Saul, Justin; Wang, Jie; Qiu, Ji; Pipas, James M.; LaBaer, Joshua
2014-01-01
Throughout the long history of virus-host co-evolution, viruses have developed delicate strategies to facilitate their invasion and replication of their genome, while silencing the host immune responses through various mechanisms. The systematic characterization of viral protein-host interactions would yield invaluable information in the understanding of viral invasion/evasion, diagnosis and therapeutic treatment of a viral infection, and mechanisms of host biology. With more than 2,000 viral genomes sequenced, only a small percent of them are well investigated. The access of these viral open reading frames (ORFs) in a flexible cloning format would greatly facilitate both in vitro and in vivo virus-host interaction studies. However, the overall progress of viral ORF cloning has been slow. To facilitate viral studies, we are releasing the initiation of our panviral proteome collection of 2,035 ORF clones from 830 viral genes in the Gateway® recombinational cloning system. Here, we demonstrate several uses of our viral collection including highly efficient production of viral proteins using human cell-free expression system in vitro, global identification of host targets for rubella virus using Nucleic Acid Programmable Protein Arrays (NAPPA) containing 10,000 unique human proteins, and detection of host serological responses using micro-fluidic multiplexed immunoassays. The studies presented here begin to elucidate host-viral protein interactions with our systemic utilization of viral ORFs, high-throughput cloning, and proteomic technologies. These valuable plasmid resources will be available to the research community to enable continued viral functional studies. PMID:24955142
Yu, Xiaobo; Bian, Xiaofang; Throop, Andrea; Song, Lusheng; Moral, Lerys Del; Park, Jin; Seiler, Catherine; Fiacco, Michael; Steel, Jason; Hunter, Preston; Saul, Justin; Wang, Jie; Qiu, Ji; Pipas, James M; LaBaer, Joshua
2014-01-01
Throughout the long history of virus-host co-evolution, viruses have developed delicate strategies to facilitate their invasion and replication of their genome, while silencing the host immune responses through various mechanisms. The systematic characterization of viral protein-host interactions would yield invaluable information in the understanding of viral invasion/evasion, diagnosis and therapeutic treatment of a viral infection, and mechanisms of host biology. With more than 2,000 viral genomes sequenced, only a small percent of them are well investigated. The access of these viral open reading frames (ORFs) in a flexible cloning format would greatly facilitate both in vitro and in vivo virus-host interaction studies. However, the overall progress of viral ORF cloning has been slow. To facilitate viral studies, we are releasing the initiation of our panviral proteome collection of 2,035 ORF clones from 830 viral genes in the Gateway® recombinational cloning system. Here, we demonstrate several uses of our viral collection including highly efficient production of viral proteins using human cell-free expression system in vitro, global identification of host targets for rubella virus using Nucleic Acid Programmable Protein Arrays (NAPPA) containing 10,000 unique human proteins, and detection of host serological responses using micro-fluidic multiplexed immunoassays. The studies presented here begin to elucidate host-viral protein interactions with our systemic utilization of viral ORFs, high-throughput cloning, and proteomic technologies. These valuable plasmid resources will be available to the research community to enable continued viral functional studies.
Guerette, Paul A; Hoon, Shawn; Seow, Yiqi; Raida, Manfred; Masic, Admir; Wong, Fong T; Ho, Vincent H B; Kong, Kiat Whye; Demirel, Melik C; Pena-Francesch, Abdon; Amini, Shahrouz; Tay, Gavin Z; Ding, Dawei; Miserez, Ali
2013-10-01
Efforts to engineer new materials inspired by biological structures are hampered by the lack of genomic data from many model organisms studied in biomimetic research. Here we show that biomimetic engineering can be accelerated by integrating high-throughput RNA-seq with proteomics and advanced materials characterization. This approach can be applied to a broad range of systems, as we illustrate by investigating diverse high-performance biological materials involved in embryo protection, adhesion and predation. In one example, we rapidly engineer recombinant squid sucker ring teeth proteins into a range of structural and functional materials, including nanopatterned surfaces and photo-cross-linked films that exceed the mechanical properties of most natural and synthetic polymers. Integrating RNA-seq with proteomics and materials science facilitates the molecular characterization of natural materials and the effective translation of their molecular designs into a wide range of bio-inspired materials.
Terfve, Camille; Sabidó, Eduard; Wu, Yibo; Gonçalves, Emanuel; Choi, Meena; Vaga, Stefania; Vitek, Olga; Saez-Rodriguez, Julio; Aebersold, Ruedi
2017-02-03
Advances in mass spectrometry have made the quantitative measurement of proteins across multiple samples a reality, allowing for the study of complex biological systems such as the metabolic syndrome. Although the deregulation of lipid metabolism and increased hepatic storage of triacylglycerides are known to play a part in the onset of the metabolic syndrome, its molecular basis and dependency on dietary and genotypic factors are poorly characterized. Here, we used an experimental design with two different mouse strains and dietary and metabolic perturbations to generate a compendium of quantitative proteome data using three mass spectrometric techniques. The data reproduce known properties of the metabolic system and indicate differential molecular adaptation of the two mouse strains to perturbations, contributing to a better understanding of the metabolic syndrome. We show that high-quality, high-throughput proteomic data sets provide an unbiased broad overview of the behavior of complex systems after perturbation.
Advances in targeted proteomics and applications to biomedical research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Tujin; Song, Ehwang; Nie, Song
Targeted proteomics technique has emerged as a powerful protein quantification tool in systems biology, biomedical research, and increasing for clinical applications. The most widely used targeted proteomics approach, selected reaction monitoring (SRM), also known as multiple reaction monitoring (MRM), can be used for quantification of cellular signaling networks and preclinical verification of candidate protein biomarkers. As an extension to our previous review on advances in SRM sensitivity (Shi et al., Proteomics, 12, 1074–1092, 2012) herein we review recent advances in the method and technology for further enhancing SRM sensitivity (from 2012 to present), and highlighting its broad biomedical applications inmore » human bodily fluids, tissue and cell lines. Furthermore, we also review two recently introduced targeted proteomics approaches, parallel reaction monitoring (PRM) and data-independent acquisition (DIA) with targeted data extraction on fast scanning high-resolution accurate-mass (HR/AM) instruments. Such HR/AM targeted quantification with monitoring all target product ions addresses SRM limitations effectively in specificity and multiplexing; whereas when compared to SRM, PRM and DIA are still in the infancy with a limited number of applications. Thus, for HR/AM targeted quantification we focus our discussion on method development, data processing and analysis, and its advantages and limitations in targeted proteomics. Finally, general perspectives on the potential of achieving both high sensitivity and high sample throughput for large-scale quantification of hundreds of target proteins are discussed.« less
Advances in Quantitative Proteomics of Microbes and Microbial Communities
NASA Astrophysics Data System (ADS)
Waldbauer, J.; Zhang, L.; Rizzo, A. I.
2015-12-01
Quantitative measurements of gene expression are key to developing a mechanistic, predictive understanding of how microbial metabolism drives many biogeochemical fluxes and responds to environmental change. High-throughput RNA-sequencing can afford a wealth of information about transcript-level expression patterns, but it is becoming clear that expression dynamics are often very different at the protein level where biochemistry actually occurs. These divergent dynamics between levels of biological organization necessitate quantitative proteomic measurements to address many biogeochemical questions. The protein-level expression changes that underlie shifts in the magnitude, or even the direction, of metabolic and biogeochemical fluxes can be quite subtle and test the limits of current quantitative proteomics techniques. Here we describe methodologies for high-precision, whole-proteome quantification that are applicable to both model organisms of biogeochemical interest that may not be genetically tractable, and to complex community samples from natural environments. Employing chemical derivatization of peptides with multiple isotopically-coded tags, this strategy is rapid and inexpensive, can be implemented on a wide range of mass spectrometric instrumentation, and is relatively insensitive to chromatographic variability. We demonstrate the utility of this quantitative proteomics approach in application to both isolates and natural communities of sulfur-metabolizing and photosynthetic microbes.
Proteomics technique opens new frontiers in mobilome research.
Davidson, Andrew D; Matthews, David A; Maringer, Kevin
2017-01-01
A large proportion of the genome of most eukaryotic organisms consists of highly repetitive mobile genetic elements. The sum of these elements is called the "mobilome," which in eukaryotes is made up mostly of transposons. Transposable elements contribute to disease, evolution, and normal physiology by mediating genetic rearrangement, and through the "domestication" of transposon proteins for cellular functions. Although 'omics studies of mobilome genomes and transcriptomes are common, technical challenges have hampered high-throughput global proteomics analyses of transposons. In a recent paper, we overcame these technical hurdles using a technique called "proteomics informed by transcriptomics" (PIT), and thus published the first unbiased global mobilome-derived proteome for any organism (using cell lines derived from the mosquito Aedes aegypti ). In this commentary, we describe our methods in more detail, and summarise our major findings. We also use new genome sequencing data to show that, in many cases, the specific genomic element expressing a given protein can be identified using PIT. This proteomic technique therefore represents an important technological advance that will open new avenues of research into the role that proteins derived from transposons and other repetitive and sequence diverse genetic elements, such as endogenous retroviruses, play in health and disease.
Korkut, Anil; Wang, Weiqing; Demir, Emek; Aksoy, Bülent Arman; Jing, Xiaohong; Molinelli, Evan J; Babur, Özgün; Bemis, Debra L; Onur Sumer, Selcuk; Solit, David B; Pratilas, Christine A; Sander, Chris
2015-08-18
Resistance to targeted cancer therapies is an important clinical problem. The discovery of anti-resistance drug combinations is challenging as resistance can arise by diverse escape mechanisms. To address this challenge, we improved and applied the experimental-computational perturbation biology method. Using statistical inference, we build network models from high-throughput measurements of molecular and phenotypic responses to combinatorial targeted perturbations. The models are computationally executed to predict the effects of thousands of untested perturbations. In RAF-inhibitor resistant melanoma cells, we measured 143 proteomic/phenotypic entities under 89 perturbation conditions and predicted c-Myc as an effective therapeutic co-target with BRAF or MEK. Experiments using the BET bromodomain inhibitor JQ1 affecting the level of c-Myc protein and protein kinase inhibitors targeting the ERK pathway confirmed the prediction. In conclusion, we propose an anti-cancer strategy of co-targeting a specific upstream alteration and a general downstream point of vulnerability to prevent or overcome resistance to targeted drugs.
Albalat, Amaya; Husi, Holger; Siwy, Justyna; Nally, Jarlath E; McLauglin, Mark; Eckersall, Peter D; Mullen, William
2014-02-01
Proteomics is a growing field that has the potential to be applied to many biology-related disciplines. However, the study of the proteome has proven to be very challenging due to its high level of complexity when compared to genome and transcriptome data. In order to analyse this level of complexity, high resolution separation of peptides/proteins are needed together with high resolution analysers. Currently, liquid chromatography and capillary electrophoresis (CE) are the two most widely used separation techniques that can be coupled on-line with a mass spectrometer (MS). In CE, proteins/ peptides are separated according to their size, charge and shape leading to high resolving power. Although further progress in the area of sensitivity, throughput and proteome coverage are expected, MS-based proteomics have developed to a level at which they are habitually applied to study a wide range of biological questions. The aim of this review is to present CE-MS as a proteomic analytical platform for biomarker research that could be used in farm animal and veterinary studies. This is a MS-analytical platform that has been widely used for biomarker research in the biomedical field but its application in animal proteomic studies is relatively novel. The review will focus on introducing the CE-MS platform and the primary considerations for its application to biomarker research. Furthermore, current applications but more importantly potential application in the field of farm animals and veterinary science will be presented and discussed.
Determination of burn patient outcome by large-scale quantitative discovery proteomics
Finnerty, Celeste C.; Jeschke, Marc G.; Qian, Wei-Jun; Kaushal, Amit; Xiao, Wenzhong; Liu, Tao; Gritsenko, Marina A.; Moore, Ronald J.; Camp, David G.; Moldawer, Lyle L.; Elson, Constance; Schoenfeld, David; Gamelli, Richard; Gibran, Nicole; Klein, Matthew; Arnoldo, Brett; Remick, Daniel; Smith, Richard D.; Davis, Ronald; Tompkins, Ronald G.; Herndon, David N.
2013-01-01
Objective Emerging proteomics techniques can be used to establish proteomic outcome signatures and to identify candidate biomarkers for survival following traumatic injury. We applied high-resolution liquid chromatography-mass spectrometry (LC-MS) and multiplex cytokine analysis to profile the plasma proteome of survivors and non-survivors of massive burn injury to determine the proteomic survival signature following a major burn injury. Design Proteomic discovery study. Setting Five burn hospitals across the U.S. Patients Thirty-two burn patients (16 non-survivors and 16 survivors), 19–89 years of age, were admitted within 96 h of injury to the participating hospitals with burns covering >20% of the total body surface area and required at least one surgical intervention. Interventions None. Measurements and Main Results We found differences in circulating levels of 43 proteins involved in the acute phase response, hepatic signaling, the complement cascade, inflammation, and insulin resistance. Thirty-two of the proteins identified were not previously known to play a role in the response to burn. IL-4, IL-8, GM-CSF, MCP-1, and β2-microglobulin correlated well with survival and may serve as clinical biomarkers. Conclusions These results demonstrate the utility of these techniques for establishing proteomic survival signatures and for use as a discovery tool to identify candidate biomarkers for survival. This is the first clinical application of a high-throughput, large-scale LC-MS-based quantitative plasma proteomic approach for biomarker discovery for the prediction of patient outcome following burn, trauma or critical illness. PMID:23507713
Jimenez, Connie R; Verheul, Henk M W
2014-01-01
Proteomics is optimally suited to bridge the gap between genomic information on the one hand and biologic functions and disease phenotypes at the other, since it studies the expression and/or post-translational modification (especially phosphorylation) of proteins--the major cellular players bringing about cellular functions--at a global level in biologic specimens. Mass spectrometry technology and (bio)informatic tools have matured to the extent that they can provide high-throughput, comprehensive, and quantitative protein inventories of cells, tissues, and biofluids in clinical samples at low level. In this article, we focus on next-generation proteomics employing nanoliquid chromatography coupled to high-resolution tandem mass spectrometry for in-depth (phospho)protein profiling of tumor tissues and (proximal) biofluids, with a focus on studies employing clinical material. In addition, we highlight emerging proteogenomic approaches for the identification of tumor-specific protein variants, and targeted multiplex mass spectrometry strategies for large-scale biomarker validation. Below we provide a discussion of recent progress, some research highlights, and challenges that remain for clinical translation of proteomic discoveries.
The Prediction of Drug-Disease Correlation Based on Gene Expression Data.
Cui, Hui; Zhang, Menghuan; Yang, Qingmin; Li, Xiangyi; Liebman, Michael; Yu, Ying; Xie, Lu
2018-01-01
The explosive growth of high-throughput experimental methods and resulting data yields both opportunity and challenge for selecting the correct drug to treat both a specific patient and their individual disease. Ideally, it would be useful and efficient if computational approaches could be applied to help achieve optimal drug-patient-disease matching but current efforts have met with limited success. Current approaches have primarily utilized the measureable effect of a specific drug on target tissue or cell lines to identify the potential biological effect of such treatment. While these efforts have met with some level of success, there exists much opportunity for improvement. This specifically follows the observation that, for many diseases in light of actual patient response, there is increasing need for treatment with combinations of drugs rather than single drug therapies. Only a few previous studies have yielded computational approaches for predicting the synergy of drug combinations by analyzing high-throughput molecular datasets. However, these computational approaches focused on the characteristics of the drug itself, without fully accounting for disease factors. Here, we propose an algorithm to specifically predict synergistic effects of drug combinations on various diseases, by integrating the data characteristics of disease-related gene expression profiles with drug-treated gene expression profiles. We have demonstrated utility through its application to transcriptome data, including microarray and RNASeq data, and the drug-disease prediction results were validated using existing publications and drug databases. It is also applicable to other quantitative profiling data such as proteomics data. We also provide an interactive web interface to allow our Prediction of Drug-Disease method to be readily applied to user data. While our studies represent a preliminary exploration of this critical problem, we believe that the algorithm can provide the basis for further refinement towards addressing a large clinical need.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wiley, H. S.
There comes a time in every field of science when things suddenly change. While it might not be immediately apparent that things are different, a tipping point has occurred. Biology is now at such a point. The reason is the introduction of high-throughput genomics-based technologies. I am not talking about the consequences of the sequencing of the human genome (and every other genome within reach). The change is due to new technologies that generate an enormous amount of data about the molecular composition of cells. These include proteomics, transcriptional profiling by sequencing, and the ability to globally measure microRNAs andmore » post-translational modifications of proteins. These mountains of digital data can be mapped to a common frame of reference: the organism’s genome. With the new high-throughput technologies, we can generate tens of thousands of data points from each sample. Data are now measured in terabytes and the time necessary to analyze data can now require years. Obviously, we can’t wait to interpret the data fully before the next experiment. In fact, we might never be able to even look at all of it, much less understand it. This volume of data requires sophisticated computational and statistical methods for its analysis and is forcing biologists to approach data interpretation as a collaborative venture.« less
USDA-ARS?s Scientific Manuscript database
An exponential increase in our understanding of genomes, proteomes, and metabolomes provides greater impetus to address critical biotechnological issues such as sustainable production of biofuels and bio-based chemicals and, in particular, the development of improved microbial biocatalysts for use i...
Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.
Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N
2004-01-01
Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.
Barkla, Bronwyn J
2018-01-01
Free flow zonal electrophoresis (FFZE) is a versatile, reproducible, and potentially high-throughput technique for the separation of plant organelles and membranes by differences in membrane surface charge. It offers considerable benefits over traditional fractionation techniques, such as density gradient centrifugation and two-phase partitioning, as it is relatively fast, sample recovery is high, and the method provides unparalleled sample purity. It has been used to successfully purify chloroplasts and mitochondria from plants but also, to obtain highly pure fractions of plasma membrane, tonoplast, ER, Golgi, and thylakoid membranes. Application of the technique can significantly improve protein coverage in large-scale proteomics studies by decreasing sample complexity. Here, we describe the method for the fractionation of plant cellular membranes from leaves by FFZE.
Yu, Yanbao; Leng, Taohua; Yun, Dong; Liu, Na; Yao, Jun; Dai, Ying; Yang, Pengyuan; Chen, Xian
2013-01-01
Emerging evidences indicate that blood platelets function in multiple biological processes including immune response, bone metastasis and liver regeneration in addition to their known roles in hemostasis and thrombosis. Global elucidation of platelet proteome will provide the molecular base of these platelet functions. Here, we set up a high throughput platform for maximum exploration of the rat/human platelet proteome using integrated proteomics technologies, and then applied to identify the largest number of the proteins expressed in both rat and human platelets. After stringent statistical filtration, a total of 837 unique proteins matched with at least two unique peptides were precisely identified, making it the first comprehensive protein database so far for rat platelets. Meanwhile, quantitative analyses of the thrombin-stimulated platelets offered great insights into the biological functions of platelet proteins and therefore confirmed our global profiling data. A comparative proteomic analysis between rat and human platelets was also conducted, which revealed not only a significant similarity, but also an across-species evolutionary link that the orthologous proteins representing ‘core proteome’, and the ‘evolutionary proteome’ is actually a relatively static proteome. PMID:20443191
Five years later: the current status of the use of proteomics and transcriptomics in EMF research.
Leszczynski, Dariusz; de Pomerai, David; Koczan, Dirk; Stoll, Dieter; Franke, Helmut; Albar, Juan Pablo
2012-08-01
The World Health Organization's and Radiation and Nuclear Safety Authority's "Workshop on Application of Proteomics and Transcriptomics in Electromagnetic Fields Research" was held in Helsinki in the October/November 2005. As a consequence of this meeting, Proteomics journal published in 2006 a special issue "Application of Proteomics and Transcriptomics in EMF Research" (Vol. 6 No. 17; Guest Editor: D. Leszczynski). This Proteomics issue presented the status of research, of the effects of electromagnetic fields (EMF) using proteomics and transcriptomics methods, present in 2005. The current overview/opinion article presents the status of research in this area by reviewing all studies that were published by the end of 2010. The review work was a part of the European Cooperation in the Field of Scientific and Technical Research (COST) Action BM0704 that created a structure in which researchers in the field of EMF and health shared knowledge and information. The review was prepared by the members of the COST Action BM0704 task group on the high-throughput screening techniques and electromagnetic fields (TG-HTST-EMF). © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
File formats commonly used in mass spectrometry proteomics.
Deutsch, Eric W
2012-12-01
The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics.
Beyond the Natural Proteome: Nondegenerate Saturation Mutagenesis-Methodologies and Advantages.
Ferreira Amaral, M M; Frigotto, L; Hine, A V
2017-01-01
Beyond the natural proteome, high-throughput mutagenesis offers the protein engineer an opportunity to "tweak" the wild-type activity of a protein to create a recombinant protein with required attributes. Of the various approaches available, saturation mutagenesis is one of the core techniques employed by protein engineers, and in recent times, nondegenerate saturation mutagenesis is emerging as the approach of choice. This review compares the current methodologies available for conducting nondegenerate saturation mutagenesis with traditional, degenerate saturation and briefly outlines the options available for screening the resulting libraries, to discover a novel protein with the required activity and/or specificity. © 2017 Elsevier Inc. All rights reserved.
Microfluidics for the analysis of membrane proteins: how do we get there?
Battle, Katrina N; Uba, Franklin I; Soper, Steven A
2014-08-01
The development of fully automated and high-throughput systems for proteomics is now in demand because of the need to generate new protein-based disease biomarkers. Unfortunately, it is difficult to identify protein biomarkers that are low abundant when in the presence of highly abundant proteins, especially in complex biological samples such as serum, cell lysates, and other biological fluids. Membrane proteins, which are in many cases of low abundance compared to the cytosolic proteins, have various functions and can provide insight into the state of a disease and serve as targets for new drugs making them attractive biomarker candidates. Traditionally, proteins are identified through the use of gel electrophoretic techniques, which are not always suitable for particular protein samples such as membrane proteins. Microfluidics offers the potential as a fully automated platform for the efficient and high-throughput analysis of complex samples, such as membrane proteins, and do so with performance metrics that exceed their bench-top counterparts. In recent years, there have been various improvements to microfluidics and their use for proteomic analysis as reported in the literature. Consequently, this review presents an overview of the traditional proteomic-processing pipelines for membrane proteins and insights into new technological developments with a focus on the applicability of microfluidics for the analysis of membrane proteins. Sample preparation techniques will be discussed in detail and novel interfacing strategies as it relates to MS will be highlighted. Lastly, some general conclusions and future perspectives are presented. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M
2012-04-05
The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
2012-01-01
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257
Analyzing large-scale proteomics projects with latent semantic indexing.
Klie, Sebastian; Martens, Lennart; Vizcaíno, Juan Antonio; Côté, Richard; Jones, Phil; Apweiler, Rolf; Hinneburg, Alexander; Hermjakob, Henning
2008-01-01
Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic analysis to the data contributed by the Human Proteome Organization's Plasma Proteome Project (HUPO PPP). Interestingly, despite the broad spectrum of instruments and methodologies applied in the HUPO PPP, our analysis reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic analysis holds great promise and is currently underexploited.
Eckhard, Ulrich; Huesgen, Pitter F; Schilling, Oliver; Bellac, Caroline L; Butler, Georgina S; Cox, Jennifer H; Dufour, Antoine; Goebeler, Verena; Kappelhoff, Reinhild; Auf dem Keller, Ulrich; Klein, Theo; Lange, Philipp F; Marino, Giada; Morrison, Charlotte J; Prudova, Anna; Rodriguez, David; Starr, Amanda E; Wang, Yili; Overall, Christopher M
2016-06-01
The data described provide a comprehensive resource for the family-wide active site specificity portrayal of the human matrix metalloproteinase family. We used the high-throughput proteomic technique PICS (Proteomic Identification of protease Cleavage Sites) to comprehensively assay 9 different MMPs. We identified more than 4300 peptide cleavage sites, spanning both the prime and non-prime sides of the scissile peptide bond allowing detailed subsite cooperativity analysis. The proteomic cleavage data were expanded by kinetic analysis using a set of 6 quenched-fluorescent peptide substrates designed using these results. These datasets represent one of the largest specificity profiling efforts with subsequent structural follow up for any protease family and put the spotlight on the specificity similarities and differences of the MMP family. A detailed analysis of this data may be found in Eckhard et al. (2015) [1]. The raw mass spectrometry data and the corresponding metadata have been deposited in PRIDE/ProteomeXchange with the accession number PXD002265.
George, Iniga S; Fennell, Anne Y; Haynes, Paul A
2015-09-01
Protein sample preparation optimisation is critical for establishing reproducible high throughput proteomic analysis. In this study, two different fractionation sample preparation techniques (in-gel digestion and in-solution digestion) for shotgun proteomics were used to quantitatively compare proteins identified in Vitis riparia leaf samples. The total number of proteins and peptides identified were compared between filter aided sample preparation (FASP) coupled with gas phase fractionation (GPF) and SDS-PAGE methods. There was a 24% increase in the total number of reproducibly identified proteins when FASP-GPF was used. FASP-GPF is more reproducible, less expensive and a better method than SDS-PAGE for shotgun proteomics of grapevine samples as it significantly increases protein identification across biological replicates. Total peptide and protein information from the two fractionation techniques is available in PRIDE with the identifier PXD001399 (http://proteomecentral.proteomexchange.org/dataset/PXD001399). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The proteomic landscape of triple-negative breast cancer.
Lawrence, Robert T; Perez, Elizabeth M; Hernández, Daniel; Miller, Chris P; Haas, Kelsey M; Irie, Hanna Y; Lee, Su-In; Blau, C Anthony; Villén, Judit
2015-04-28
Triple-negative breast cancer is a heterogeneous disease characterized by poor clinical outcomes and a shortage of targeted treatment options. To discover molecular features of triple-negative breast cancer, we performed quantitative proteomics analysis of twenty human-derived breast cell lines and four primary breast tumors to a depth of more than 12,000 distinct proteins. We used this data to identify breast cancer subtypes at the protein level and demonstrate the precise quantification of biomarkers, signaling proteins, and biological pathways by mass spectrometry. We integrated proteomics data with exome sequence resources to identify genomic aberrations that affect protein expression. We performed a high-throughput drug screen to identify protein markers of drug sensitivity and understand the mechanisms of drug resistance. The genome and proteome provide complementary information that, when combined, yield a powerful engine for therapeutic discovery. This resource is available to the cancer research community to catalyze further analysis and investigation. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying
2016-12-23
Protein-protein interactions (PPIs) are essential to most biological processes. Since bioscience has entered into the era of genome and proteome, there is a growing demand for the knowledge about PPI network. High-throughput biological technologies can be used to identify new PPIs, but they are expensive, time-consuming, and tedious. Therefore, computational methods for predicting PPIs have an important role. For the past years, an increasing number of computational methods such as protein structure-based approaches have been proposed for predicting PPIs. The major limitation in principle of these methods lies in the prior information of the protein to infer PPIs. Therefore, it is of much significance to develop computational methods which only use the information of protein amino acids sequence. Here, we report a highly efficient approach for predicting PPIs. The main improvements come from the use of a novel protein sequence representation by combining continuous wavelet descriptor and Chou's pseudo amino acid composition (PseAAC), and from adopting weighted sparse representation based classifier (WSRC). This method, cross-validated on the PPIs datasets of Saccharomyces cerevisiae, Human and H. pylori, achieves an excellent results with accuracies as high as 92.50%, 95.54% and 84.28% respectively, significantly better than previously proposed methods. Extensive experiments are performed to compare the proposed method with state-of-the-art Support Vector Machine (SVM) classifier. The outstanding results yield by our model that the proposed feature extraction method combing two kinds of descriptors have strong expression ability and are expected to provide comprehensive and effective information for machine learning-based classification models. In addition, the prediction performance in the comparison experiments shows the well cooperation between the combined feature and WSRC. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.
MixGF: spectral probabilities for mixture spectra from more than one peptide.
Wang, Jian; Bourne, Philip E; Bandeira, Nuno
2014-12-01
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
MixGF: Spectral Probabilities for Mixture Spectra from more than One Peptide*
Wang, Jian; Bourne, Philip E.; Bandeira, Nuno
2014-01-01
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. PMID:25225354
Advances in targeted proteomics and applications to biomedical research
Shi, Tujin; Song, Ehwang; Nie, Song; Rodland, Karin D.; Liu, Tao; Qian, Wei-Jun; Smith, Richard D.
2016-01-01
Targeted proteomics technique has emerged as a powerful protein quantification tool in systems biology, biomedical research, and increasing for clinical applications. The most widely used targeted proteomics approach, selected reaction monitoring (SRM), also known as multiple reaction monitoring (MRM), can be used for quantification of cellular signaling networks and preclinical verification of candidate protein biomarkers. As an extension to our previous review on advances in SRM sensitivity herein we review recent advances in the method and technology for further enhancing SRM sensitivity (from 2012 to present), and highlighting its broad biomedical applications in human bodily fluids, tissue and cell lines. Furthermore, we also review two recently introduced targeted proteomics approaches, parallel reaction monitoring (PRM) and data-independent acquisition (DIA) with targeted data extraction on fast scanning high-resolution accurate-mass (HR/AM) instruments. Such HR/AM targeted quantification with monitoring all target product ions addresses SRM limitations effectively in specificity and multiplexing; whereas when compared to SRM, PRM and DIA are still in the infancy with a limited number of applications. Thus, for HR/AM targeted quantification we focus our discussion on method development, data processing and analysis, and its advantages and limitations in targeted proteomics. Finally, general perspectives on the potential of achieving both high sensitivity and high sample throughput for large-scale quantification of hundreds of target proteins are discussed. PMID:27302376
Less is More: Membrane Protein Digestion Beyond Urea–Trypsin Solution for Next-level Proteomics*
Zhang, Xi
2015-01-01
The goal of next-level bottom-up membrane proteomics is protein function investigation, via high-coverage high-throughput peptide-centric quantitation of expression, modifications and dynamic structures at systems scale. Yet efficient digestion of mammalian membrane proteins presents a daunting barrier, and prevalent day-long urea–trypsin in-solution digestion proved insufficient to reach this goal. Many efforts contributed incremental advances over past years, but involved protein denaturation that disconnected measurement from functional states. Beyond denaturation, the recent discovery of structure/proteomics omni-compatible detergent n-dodecyl-β-d-maltopyranoside, combined with pepsin and PNGase F columns, enabled breakthroughs in membrane protein digestion: a 2010 DDM-low-TCEP (DLT) method for H/D-exchange (HDX) using human G protein-coupled receptor, and a 2015 flow/detergent-facilitated protease and de-PTM digestions (FDD) for integrative deep sequencing and quantitation using full-length human ion channel complex. Distinguishing protein solubilization from denaturation, protease digestion reliability from theoretical specificity, and reduction from alkylation, these methods shifted day(s)-long paradigms into minutes, and afforded fully automatable (HDX)-protein-peptide-(tandem mass tag)-HPLC pipelines to instantly measure functional proteins at deep coverage, high peptide reproducibility, low artifacts and minimal leakage. Promoting—not destroying—structures and activities harnessed membrane proteins for the next-level streamlined functional proteomics. This review analyzes recent advances in membrane protein digestion methods and highlights critical discoveries for future proteomics. PMID:26081834
Proteogenomic insights into uranium tolerance of a Chernobyl's Microbacterium bacterial isolate.
Gallois, Nicolas; Alpha-Bazin, Béatrice; Ortet, Philippe; Barakat, Mohamed; Piette, Laurie; Long, Justine; Berthomieu, Catherine; Armengaud, Jean; Chapon, Virginie
2018-04-15
Microbacterium oleivorans A9 is a uranium-tolerant actinobacteria isolated from the trench T22 located near the Chernobyl nuclear power plant. This site is contaminated with different radionuclides including uranium. To observe the molecular changes at the proteome level occurring in this strain upon uranyl exposure and understand molecular mechanisms explaining its uranium tolerance, we established its draft genome and used this raw information to perform an in-depth proteogenomics study. High-throughput proteomics were performed on cells exposed or not to 10μM uranyl nitrate sampled at three previously identified phases of uranyl tolerance. We experimentally detected and annotated 1532 proteins and highlighted a total of 591 proteins for which abundances were significantly differing between conditions. Notably, proteins involved in phosphate and iron metabolisms show high dynamics. A large ratio of proteins more abundant upon uranyl stress, are distant from functionally-annotated known proteins, highlighting the lack of fundamental knowledge regarding numerous key molecular players from soil bacteria. Microbacterium oleivorans A9 is an interesting environmental model to understand biological processes engaged in tolerance to radionuclides. Using an innovative proteogenomics approach, we explored its molecular mechanisms involved in uranium tolerance. We sequenced its genome, interpreted high-throughput proteomic data against a six-reading frame ORF database deduced from the draft genome, annotated the identified proteins and compared protein abundances from cells exposed or not to uranyl stress after a cascade search. These data show that a complex cellular response to uranium occurs in Microbacterium oleivorans A9, where one third of the experimental proteome is modified. In particular, the uranyl stress perturbed the phosphate and iron metabolic pathways. Furthermore, several transporters have been identified to be specifically associated to uranyl stress, paving the way to the development of biotechnological tools for uranium decontamination. Copyright © 2017. Published by Elsevier B.V.
Proteomics technique opens new frontiers in mobilome research
Davidson, Andrew D.; Matthews, David A.
2017-01-01
ABSTRACT A large proportion of the genome of most eukaryotic organisms consists of highly repetitive mobile genetic elements. The sum of these elements is called the “mobilome,” which in eukaryotes is made up mostly of transposons. Transposable elements contribute to disease, evolution, and normal physiology by mediating genetic rearrangement, and through the “domestication” of transposon proteins for cellular functions. Although ‘omics studies of mobilome genomes and transcriptomes are common, technical challenges have hampered high-throughput global proteomics analyses of transposons. In a recent paper, we overcame these technical hurdles using a technique called “proteomics informed by transcriptomics” (PIT), and thus published the first unbiased global mobilome-derived proteome for any organism (using cell lines derived from the mosquito Aedes aegypti). In this commentary, we describe our methods in more detail, and summarise our major findings. We also use new genome sequencing data to show that, in many cases, the specific genomic element expressing a given protein can be identified using PIT. This proteomic technique therefore represents an important technological advance that will open new avenues of research into the role that proteins derived from transposons and other repetitive and sequence diverse genetic elements, such as endogenous retroviruses, play in health and disease. PMID:28932623
Dr. Janie Merkel is interviewed by Ryan Blum and Janice Friend.
Merkel, Janie
2007-12-01
Dr. Janie Merkel is the director of Yale's Chemical Genomics Screening Facility, a high-throughput screening laboratory that is part of the Yale University Center for Genomics and Proteomics. The Screening Facility connects Yale researchers with industry-quality robotic machinery and a diverse group of compound libraries, which have been used successfully to link therapeutic targets with potential therapies.
Multiplex High-Throughput Targeted Proteomic Assay To Identify Induced Pluripotent Stem Cells.
Baud, Anna; Wessely, Frank; Mazzacuva, Francesca; McCormick, James; Camuzeaux, Stephane; Heywood, Wendy E; Little, Daniel; Vowles, Jane; Tuefferd, Marianne; Mosaku, Olukunbi; Lako, Majlinda; Armstrong, Lyle; Webber, Caleb; Cader, M Zameel; Peeters, Pieter; Gissen, Paul; Cowley, Sally A; Mills, Kevin
2017-02-21
Induced pluripotent stem cells have great potential as a human model system in regenerative medicine, disease modeling, and drug screening. However, their use in medical research is hampered by laborious reprogramming procedures that yield low numbers of induced pluripotent stem cells. For further applications in research, only the best, competent clones should be used. The standard assays for pluripotency are based on genomic approaches, which take up to 1 week to perform and incur significant cost. Therefore, there is a need for a rapid and cost-effective assay able to distinguish between pluripotent and nonpluripotent cells. Here, we describe a novel multiplexed, high-throughput, and sensitive peptide-based multiple reaction monitoring mass spectrometry assay, allowing for the identification and absolute quantitation of multiple core transcription factors and pluripotency markers. This assay provides simpler and high-throughput classification into either pluripotent or nonpluripotent cells in 7 min analysis while being more cost-effective than conventional genomic tests.
Silva, Wanderson M; Carvalho, Rodrigo D; Soares, Siomar C; Bastos, Isabela Fs; Folador, Edson L; Souza, Gustavo Hmf; Le Loir, Yves; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco
2014-12-04
Corynebacterium pseudotuberculosis biovar ovis is a facultative intracellular pathogen, and the etiological agent of caseous lymphadenitis in small ruminants. During the infection process, the bacterium is subjected to several stress conditions, including nitrosative stress, which is caused by nitric oxide (NO). In silico analysis of the genome of C. pseudotuberculosis ovis 1002 predicted several genes that could influence the resistance of this pathogen to nitrosative stress. Here, we applied high-throughput proteomics using high definition mass spectrometry to characterize the functional genome of C. pseudotuberculosis ovis 1002 in the presence of NO-donor Diethylenetriamine/nitric oxide adduct (DETA/NO), with the aim of identifying proteins involved in nitrosative stress resistance. We characterized 835 proteins, representing approximately 41% of the predicted proteome of C. pseudotuberculosis ovis 1002, following exposure to nitrosative stress. In total, 102 proteins were exclusive to the proteome of DETA/NO-induced cells, and a further 58 proteins were differentially regulated between the DETA/NO and control conditions. An interactomic analysis of the differential proteome of C. pseudotuberculosis in response to nitrosative stress was also performed. Our proteomic data set suggested the activation of both a general stress response and a specific nitrosative stress response, as well as changes in proteins involved in cellular metabolism, detoxification, transcriptional regulation, and DNA synthesis and repair. Our proteomic analysis validated previously-determined in silico data for C. pseudotuberculosis ovis 1002. In addition, proteomic screening performed in the presence of NO enabled the identification of a set of factors that can influence the resistance and survival of C. pseudotuberculosis during exposure to nitrosative stress.
Definitive screening design enables optimization of LC-ESI-MS/MS parameters in proteomics.
Aburaya, Shunsuke; Aoki, Wataru; Minakuchi, Hiroyoshi; Ueda, Mitsuyoshi
2017-12-01
In proteomics, more than 100,000 peptides are generated from the digestion of human cell lysates. Proteome samples have a broad dynamic range in protein abundance; therefore, it is critical to optimize various parameters of LC-ESI-MS/MS to comprehensively identify these peptides. However, there are many parameters for LC-ESI-MS/MS analysis. In this study, we applied definitive screening design to simultaneously optimize 14 parameters in the operation of monolithic capillary LC-ESI-MS/MS to increase the number of identified proteins and/or the average peak area of MS1. The simultaneous optimization enabled the determination of two-factor interactions between LC and MS. Finally, we found two parameter sets of monolithic capillary LC-ESI-MS/MS that increased the number of identified proteins by 8.1% or the average peak area of MS1 by 67%. The definitive screening design would be highly useful for high-throughput analysis of the best parameter set in LC-ESI-MS/MS systems.
Gore, Brooklin
2018-02-01
This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
Kortz, Linda; Helmschrodt, Christin; Ceglarek, Uta
2011-03-01
In the last decade various analytical strategies have been established to enhance separation speed and efficiency in high performance liquid chromatography applications. Chromatographic supports based on monolithic material, small porous particles, and porous layer beads have been developed and commercialized to improve throughput and separation efficiency. This paper provides an overview of current developments in fast chromatography combined with mass spectrometry for the analysis of metabolites and proteins in clinical applications. Advances and limitations of fast chromatography for the combination with mass spectrometry are discussed. Practical aspects of, recent developments in, and the present status of high-throughput analysis of human body fluids for therapeutic drug monitoring, toxicology, clinical metabolomics, and proteomics are presented.
AOPs & Biomarkers: Bridging High Throughput Screening and Regulatory Decision Making.
As high throughput screening (HTS) approaches play a larger role in toxicity testing, computational toxicology has emerged as a critical component in interpreting the large volume of data produced. Computational models for this purpose are becoming increasingly more sophisticated...
COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA
Wenger, Craig D.; Phanstiel, Douglas H.; Lee, M. Violet; Bailey, Derek J.; Coon, Joshua J.
2011-01-01
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline. PMID:21298793
Schokraie, Elham; Warnken, Uwe; Hotz-Wagenblatt, Agnes; Grohme, Markus A; Hengherr, Steffen; Förster, Frank; Schill, Ralph O; Frohme, Marcus; Dandekar, Thomas; Schnölzer, Martina
2012-01-01
Tardigrades have fascinated researchers for more than 300 years because of their extraordinary capability to undergo cryptobiosis and survive extreme environmental conditions. However, the survival mechanisms of tardigrades are still poorly understood mainly due to the absence of detailed knowledge about the proteome and genome of these organisms. Our study was intended to provide a basis for the functional characterization of expressed proteins in different states of tardigrades. High-throughput, high-accuracy proteomics in combination with a newly developed tardigrade specific protein database resulted in the identification of more than 3000 proteins in three different states: early embryonic state and adult animals in active and anhydrobiotic state. This comprehensive proteome resource includes protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. A comparative analysis of protein families in the different states was performed by calculating the exponentially modified protein abundance index which classifies proteins in major and minor components. This is the first step to analyzing the proteins involved in early embryonic development, and furthermore proteins which might play an important role in the transition into the anhydrobiotic state.
Schokraie, Elham; Warnken, Uwe; Hotz-Wagenblatt, Agnes; Grohme, Markus A.; Hengherr, Steffen; Förster, Frank; Schill, Ralph O.; Frohme, Marcus; Dandekar, Thomas; Schnölzer, Martina
2012-01-01
Tardigrades have fascinated researchers for more than 300 years because of their extraordinary capability to undergo cryptobiosis and survive extreme environmental conditions. However, the survival mechanisms of tardigrades are still poorly understood mainly due to the absence of detailed knowledge about the proteome and genome of these organisms. Our study was intended to provide a basis for the functional characterization of expressed proteins in different states of tardigrades. High-throughput, high-accuracy proteomics in combination with a newly developed tardigrade specific protein database resulted in the identification of more than 3000 proteins in three different states: early embryonic state and adult animals in active and anhydrobiotic state. This comprehensive proteome resource includes protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. A comparative analysis of protein families in the different states was performed by calculating the exponentially modified protein abundance index which classifies proteins in major and minor components. This is the first step to analyzing the proteins involved in early embryonic development, and furthermore proteins which might play an important role in the transition into the anhydrobiotic state. PMID:23029181
High-Throughput Thermodynamic Modeling and Uncertainty Quantification for ICME
NASA Astrophysics Data System (ADS)
Otis, Richard A.; Liu, Zi-Kui
2017-05-01
One foundational component of the integrated computational materials engineering (ICME) and Materials Genome Initiative is the computational thermodynamics based on the calculation of phase diagrams (CALPHAD) method. The CALPHAD method pioneered by Kaufman has enabled the development of thermodynamic, atomic mobility, and molar volume databases of individual phases in the full space of temperature, composition, and sometimes pressure for technologically important multicomponent engineering materials, along with sophisticated computational tools for using the databases. In this article, our recent efforts will be presented in terms of developing new computational tools for high-throughput modeling and uncertainty quantification based on high-throughput, first-principles calculations and the CALPHAD method along with their potential propagations to downstream ICME modeling and simulations.
Li, Tie-Mei; Zhang, Ju-en; Lin, Rui; Chen, She; Luo, Minmin; Dong, Meng-Qiu
2016-01-01
Sleep is a ubiquitous, tightly regulated, and evolutionarily conserved behavior observed in almost all animals. Prolonged sleep deprivation can be fatal, indicating that sleep is a physiological necessity. However, little is known about its core function. To gain insight into this mystery, we used advanced quantitative proteomics technology to survey the global changes in brain protein abundance. Aiming to gain a comprehensive profile, our proteomics workflow included filter-aided sample preparation (FASP), which increased the coverage of membrane proteins; tandem mass tag (TMT) labeling, for relative quantitation; and high resolution, high mass accuracy, high throughput mass spectrometry (MS). In total, we obtained the relative abundance ratios of 9888 proteins encoded by 6070 genes. Interestingly, we observed significant enrichment for mitochondrial proteins among the differentially expressed proteins. This finding suggests that sleep deprivation strongly affects signaling pathways that govern either energy metabolism or responses to mitochondrial stress. Additionally, the differentially-expressed proteins are enriched in pathways implicated in age-dependent neurodegenerative diseases, including Parkinson’s, Huntington’s, and Alzheimer’s, hinting at possible connections between sleep loss, mitochondrial stress, and neurodegeneration. PMID:27684481
Multiscale peak detection in wavelet space.
Zhang, Zhi-Min; Tong, Xia; Peng, Ying; Ma, Pan; Zhang, Ming-Jin; Lu, Hong-Mei; Chen, Xiao-Qing; Liang, Yi-Zeng
2015-12-07
Accurate peak detection is essential for analyzing high-throughput datasets generated by analytical instruments. Derivatives with noise reduction and matched filtration are frequently used, but they are sensitive to baseline variations, random noise and deviations in the peak shape. A continuous wavelet transform (CWT)-based method is more practical and popular in this situation, which can increase the accuracy and reliability by identifying peaks across scales in wavelet space and implicitly removing noise as well as the baseline. However, its computational load is relatively high and the estimated features of peaks may not be accurate in the case of peaks that are overlapping, dense or weak. In this study, we present multi-scale peak detection (MSPD) by taking full advantage of additional information in wavelet space including ridges, valleys, and zero-crossings. It can achieve a high accuracy by thresholding each detected peak with the maximum of its ridge. It has been comprehensively evaluated with MALDI-TOF spectra in proteomics, the CAMDA 2006 SELDI dataset as well as the Romanian database of Raman spectra, which is particularly suitable for detecting peaks in high-throughput analytical signals. Receiver operating characteristic (ROC) curves show that MSPD can detect more true peaks while keeping the false discovery rate lower than MassSpecWavelet and MALDIquant methods. Superior results in Raman spectra suggest that MSPD seems to be a more universal method for peak detection. MSPD has been designed and implemented efficiently in Python and Cython. It is available as an open source package at .
Orton, Dennis J.; Doucette, Alan A.
2013-01-01
Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400
Sperm Proteome: What Is on the Horizon?
Mohanty, Gayatri; Swain, Nirlipta; Samanta, Luna
2015-06-01
As the mammalian spermatozoa transcends from the testis to the end of the epididymal tubule, the functionally incompetent spermatozoa acquires its fertilizing capability. Molecular changes in the spermatozoa at the posttesticular level concern qualitative and quantitative modifications of proteins along with their sugar moieties and membranous lipids mostly associated with motility, egg binding, and penetration processes. Proteomic studies have identified numerous sperm-specific proteins, and recent reports have provided a further understanding of their function with respect to male fertility. High-throughput techniques such as mass spectrometry have shown drastic potential for the identification and study of sperm proteins. In fact, compelling evidence has provided that proteins are critically important in cellular remodeling event and that aberrant expression is associated with pronounced defects in sperm function. This review highlights the posttesticular functional transformation in the epididymis and female reproductive tract with due emphasis on proteomics. © The Author(s) 2014.
Approaches for Defining the Hsp90-dependent Proteome
Hartson, Steven D.; Matts, Robert L.
2011-01-01
Hsp90 is the target of ongoing drug discovery studies seeking new compounds to treat cancer, neurodegenerative diseases, and protein folding disorders. To better understand Hsp90’s roles in cellular pathologies and in normal cells, numerous studies have utilized proteomics assays and related high-throughput tools to characterize its physical and functional protein partnerships. This review surveys these studies, and summarizes the strengths and limitations of the individual attacks. We also include downloadable spreadsheets compiling all of the Hsp90-interacting proteins identified in more than 23 studies. These tools include cross-references among gene aliases, human homologues of yeast Hsp90-interacting proteins, hyperlinks to database entries, summaries of canonical pathways that are enriched in the Hsp90 interactome, and additional bioinformatic annotations. In addition to summarizing Hsp90 proteomics studies performed to date and the insights they have provided, we identify gaps in our current understanding of Hsp90-mediated proteostasis. PMID:21906632
Methods, Tools and Current Perspectives in Proteogenomics *
Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.
2017-01-01
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, the U.S. Environ...
Systems Approaches to Biology and Disease Enable Translational Systems Medicine
Hood, Leroy; Tian, Qiang
2012-01-01
The development and application of systems strategies to biology and disease are transforming medical research and clinical practice in an unprecedented rate. In the foreseeable future, clinicians, medical researchers, and ultimately the consumers and patients will be increasingly equipped with a deluge of personal health information, e.g., whole genome sequences, molecular profiling of diseased tissues, and periodic multi-analyte blood testing of biomarker panels for disease and wellness. The convergence of these practices will enable accurate prediction of disease susceptibility and early diagnosis for actionable preventive schema and personalized treatment regimes tailored to each individual. It will also entail proactive participation from all major stakeholders in the health care system. We are at the dawn of predictive, preventive, personalized, and participatory (P4) medicine, the fully implementation of which requires marrying basic and clinical researches through advanced systems thinking and the employment of high-throughput technologies in genomics, proteomics, nanofluidics, single-cell analysis, and computation strategies in a highly-orchestrated discipline we termed translational systems medicine. PMID:23084773
Curated protein information in the Saccharomyces genome database.
Hellerstedt, Sage T; Nash, Robert S; Weng, Shuai; Paskov, Kelley M; Wong, Edith D; Karra, Kalpana; Engel, Stacia R; Cherry, J Michael
2017-01-01
Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. : www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.
File Formats Commonly Used in Mass Spectrometry Proteomics*
Deutsch, Eric W.
2012-01-01
The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics. PMID:22956731
Vivek-Ananth, R P; Mohanraj, Karthikeyan; Vandanashree, Muralidharan; Jhingran, Anupam; Craig, James P; Samal, Areejit
2018-04-26
Aspergillus fumigatus and multiple other Aspergillus species cause a wide range of lung infections, collectively termed aspergillosis. Aspergilli are ubiquitous in environment with healthy immune systems routinely eliminating inhaled conidia, however, Aspergilli can become an opportunistic pathogen in immune-compromised patients. The aspergillosis mortality rate and emergence of drug-resistance reveals an urgent need to identify novel targets. Secreted and cell membrane proteins play a critical role in fungal-host interactions and pathogenesis. Using a computational pipeline integrating data from high-throughput experiments and bioinformatic predictions, we have identified secreted and cell membrane proteins in ten Aspergillus species known to cause aspergillosis. Small secreted and effector-like proteins similar to agents of fungal-plant pathogenesis were also identified within each secretome. A comparison with humans revealed that at least 70% of Aspergillus secretomes have no sequence similarity with the human proteome. An analysis of antigenic qualities of Aspergillus proteins revealed that the secretome is significantly more antigenic than cell membrane proteins or the complete proteome. Finally, overlaying an expression dataset, four A. fumigatus proteins upregulated during infection and with available structures, were found to be structurally similar to known drug target proteins in other organisms, and were able to dock in silico with the respective drug.
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
NASA Astrophysics Data System (ADS)
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-12-01
Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-01-01
Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.
Detection of co-eluted peptides using database search methods
Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei; Wu, Wells W; Wang, Guanghui; Shen, Rong-Fong; Yu, Yi-Kuo
2008-01-01
Background Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost. Results We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods – SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS – in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides. Open peer review Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section. PMID:18597684
P2P proteomics -- data sharing for enhanced protein identification
2012-01-01
Background In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers. Methods The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities. Results and Discussion We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories. PMID:22293032
Korkut, Anil; Wang, Weiqing; Demir, Emek; Aksoy, Bülent Arman; Jing, Xiaohong; Molinelli, Evan J; Babur, Özgün; Bemis, Debra L; Onur Sumer, Selcuk; Solit, David B; Pratilas, Christine A; Sander, Chris
2015-01-01
Resistance to targeted cancer therapies is an important clinical problem. The discovery of anti-resistance drug combinations is challenging as resistance can arise by diverse escape mechanisms. To address this challenge, we improved and applied the experimental-computational perturbation biology method. Using statistical inference, we build network models from high-throughput measurements of molecular and phenotypic responses to combinatorial targeted perturbations. The models are computationally executed to predict the effects of thousands of untested perturbations. In RAF-inhibitor resistant melanoma cells, we measured 143 proteomic/phenotypic entities under 89 perturbation conditions and predicted c-Myc as an effective therapeutic co-target with BRAF or MEK. Experiments using the BET bromodomain inhibitor JQ1 affecting the level of c-Myc protein and protein kinase inhibitors targeting the ERK pathway confirmed the prediction. In conclusion, we propose an anti-cancer strategy of co-targeting a specific upstream alteration and a general downstream point of vulnerability to prevent or overcome resistance to targeted drugs. DOI: http://dx.doi.org/10.7554/eLife.04640.001 PMID:26284497
Notredame, Cedric
2018-05-02
Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.
Martyniuk, Christopher J; Popesku, Jason T; Chown, Brittany; Denslow, Nancy D; Trudeau, Vance L
2012-05-01
Neuroendocrine systems integrate both extrinsic and intrinsic signals to regulate virtually all aspects of an animal's physiology. In aquatic toxicology, studies have shown that pollutants are capable of disrupting the neuroendocrine system of teleost fish, and many chemicals found in the environment can also have a neurotoxic mode of action. Omics approaches are now used to better understand cell signaling cascades underlying fish neurophysiology and the control of pituitary hormone release, in addition to identifying adverse effects of pollutants in the teleostean central nervous system. For example, both high throughput genomics and proteomic investigations of molecular signaling cascades for both neurotransmitter and nuclear receptor agonists/antagonists have been reported. This review highlights recent studies that have utilized quantitative proteomics methods such as 2D differential in-gel electrophoresis (DIGE) and isobaric tagging for relative and absolute quantitation (iTRAQ) in neuroendocrine regions and uses these examples to demonstrate the challenges of using proteomics in neuroendocrinology and neurotoxicology research. To begin to characterize the teleost neuroproteome, we functionally annotated 623 unique proteins found in the fish hypothalamus and telencephalon. These proteins have roles in biological processes that include synaptic transmission, ATP production, receptor activity, cell structure and integrity, and stress responses. The biological processes most represented by proteins detected in the teleost neuroendocrine brain included transport (8.4%), metabolic process (5.5%), and glycolysis (4.8%). We provide an example of using sub-network enrichment analysis (SNEA) to identify protein networks in the fish hypothalamus in response to dopamine receptor signaling. Dopamine signaling altered the abundance of proteins that are binding partners of microfilaments, integrins, and intermediate filaments, consistent with data suggesting dopaminergic regulation of neuronal stability and structure. Lastly, for fish neuroendocrine studies using both high-throughput genomics and proteomics, we compare gene and protein relationships in the hypothalamus and demonstrate that correlation is often poor for single time point experiments. These studies highlight the need for additional time course analyses to better understand gene-protein relationships and adverse outcome pathways. This is important if both transcriptomics and proteomics are to be used together to investigate neuroendocrine signaling pathways or as bio-monitoring tools in ecotoxicology. Copyright © 2011 Elsevier Inc. All rights reserved.
Application of Large-Scale Aptamer-Based Proteomic Profiling to Planned Myocardial Infarctions.
Jacob, Jaison; Ngo, Debby; Finkel, Nancy; Pitts, Rebecca; Gleim, Scott; Benson, Mark D; Keyes, Michelle J; Farrell, Laurie A; Morgan, Thomas; Jennings, Lori L; Gerszten, Robert E
2018-03-20
Emerging proteomic technologies using novel affinity-based reagents allow for efficient multiplexing with high-sample throughput. To identify early biomarkers of myocardial injury, we recently applied an aptamer-based proteomic profiling platform that measures 1129 proteins to samples from patients undergoing septal alcohol ablation for hypertrophic cardiomyopathy, a human model of planned myocardial injury. Here, we examined the scalability of this approach using a markedly expanded platform to study a far broader range of human proteins in the context of myocardial injury. We applied a highly multiplexed, expanded proteomic technique that uses single-stranded DNA aptamers to assay 4783 human proteins (4137 distinct human gene targets) to derivation and validation cohorts of planned myocardial injury, individuals with spontaneous myocardial infarction, and at-risk controls. We found 376 target proteins that significantly changed in the blood after planned myocardial injury in a derivation cohort (n=20; P <1.05E-05, 1-way repeated measures analysis of variance, Bonferroni threshold). Two hundred forty-seven of these proteins were validated in an independent planned myocardial injury cohort (n=15; P <1.33E-04, 1-way repeated measures analysis of variance); >90% were directionally consistent and reached nominal significance in the validation cohort. Among the validated proteins that were increased within 1 hour after planned myocardial injury, 29 were also elevated in patients with spontaneous myocardial infarction (n=63; P <6.17E-04). Many of the novel markers identified in our study are intracellular proteins not previously identified in the peripheral circulation or have functional roles relevant to myocardial injury. For example, the cardiac LIM protein, cysteine- and glycine-rich protein 3, is thought to mediate cardiac mechanotransduction and stress responses, whereas the mitochondrial ATP synthase F 0 subunit component is a vasoactive peptide on its release from cells. Last, we performed aptamer-affinity enrichment coupled with mass spectrometry to technically verify aptamer specificity for a subset of the new biomarkers. Our results demonstrate the feasibility of large-scale aptamer multiplexing at a level that has not previously been reported and with sample throughput that greatly exceeds other existing proteomic methods. The expanded aptamer-based proteomic platform provides a unique opportunity for biomarker and pathway discovery after myocardial injury. © 2017 American Heart Association, Inc.
Less is More: Membrane Protein Digestion Beyond Urea-Trypsin Solution for Next-level Proteomics.
Zhang, Xi
2015-09-01
The goal of next-level bottom-up membrane proteomics is protein function investigation, via high-coverage high-throughput peptide-centric quantitation of expression, modifications and dynamic structures at systems scale. Yet efficient digestion of mammalian membrane proteins presents a daunting barrier, and prevalent day-long urea-trypsin in-solution digestion proved insufficient to reach this goal. Many efforts contributed incremental advances over past years, but involved protein denaturation that disconnected measurement from functional states. Beyond denaturation, the recent discovery of structure/proteomics omni-compatible detergent n-dodecyl-β-d-maltopyranoside, combined with pepsin and PNGase F columns, enabled breakthroughs in membrane protein digestion: a 2010 DDM-low-TCEP (DLT) method for H/D-exchange (HDX) using human G protein-coupled receptor, and a 2015 flow/detergent-facilitated protease and de-PTM digestions (FDD) for integrative deep sequencing and quantitation using full-length human ion channel complex. Distinguishing protein solubilization from denaturation, protease digestion reliability from theoretical specificity, and reduction from alkylation, these methods shifted day(s)-long paradigms into minutes, and afforded fully automatable (HDX)-protein-peptide-(tandem mass tag)-HPLC pipelines to instantly measure functional proteins at deep coverage, high peptide reproducibility, low artifacts and minimal leakage. Promoting-not destroying-structures and activities harnessed membrane proteins for the next-level streamlined functional proteomics. This review analyzes recent advances in membrane protein digestion methods and highlights critical discoveries for future proteomics. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Mock, Andreas; Chiblak, Sara; Herold-Mende, Christel
2014-01-01
A growing body of evidence suggests that glioma stem cells (GSCs) account for tumor initiation, therapy resistance, and the subsequent regrowth of gliomas. Thus, continuous efforts have been undertaken to further characterize this subpopulation of less differentiated tumor cells. Although we are able to enrich GSCs, we still lack a comprehensive understanding of GSC phenotypes and behavior. The advent of high-throughput technologies raised hope that incorporation of these newly developed platforms would help to tackle such questions. Since then a couple of comparative genome-, transcriptome- and proteome-wide studies on GSCs have been conducted giving new insights in GSC biology. However, lessons had to be learned in designing high-throughput experiments and some of the resulting conclusions fell short of expectations because they were performed on only a few GSC lines or at one molecular level instead of an integrative poly-omics approach. Despite these shortcomings, our knowledge of GSC biology has markedly expanded due to a number of survival-associated biomarkers as well as glioma-relevant signaling pathways and therapeutic targets being identified. In this article we review recent findings obtained by comparative high-throughput analyses of GSCs. We further summarize fundamental concepts of systems biology as well as its applications for glioma stem cell research.
Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer
Kim, Yunee; Jeon, Jouhyun; Mejia, Salvador; Yao, Cindy Q; Ignatchenko, Vladimir; Nyalwidhe, Julius O; Gramolini, Anthony O; Lance, Raymond S; Troyer, Dean A; Drake, Richard R; Boutros, Paul C; Semmes, O. John; Kislinger, Thomas
2016-01-01
Biomarkers are rapidly gaining importance in personalized medicine. Although numerous molecular signatures have been developed over the past decade, there is a lack of overlap and many biomarkers fail to validate in independent patient cohorts and hence are not useful for clinical application. For these reasons, identification of novel and robust biomarkers remains a formidable challenge. We combine targeted proteomics with computational biology to discover robust proteomic signatures for prostate cancer. Quantitative proteomics conducted in expressed prostatic secretions from men with extraprostatic and organ-confined prostate cancers identified 133 differentially expressed proteins. Using synthetic peptides, we evaluate them by targeted proteomics in a 74-patient cohort of expressed prostatic secretions in urine. We quantify a panel of 34 candidates in an independent 207-patient cohort. We apply machine-learning approaches to develop clinical predictive models for prostate cancer diagnosis and prognosis. Our results demonstrate that computationally guided proteomics can discover highly accurate non-invasive biomarkers. PMID:27350604
Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays (SOT)
Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays DE DeGroot, RS Thomas, and SO SimmonsNational Center for Computational Toxicology, US EPA, Research Triangle Park, NC USAThe EPA’s ToxCast program utilizes a wide variety of high-throughput s...
Parreira, J R; Bouraada, J; Fitzpatrick, M A; Silvestre, S; Bernardes da Silva, A; Marques da Silva, J; Almeida, A M; Fevereiro, P; Altelaar, A F M; Araújo, S S
2016-06-30
Common bean (Phaseolus vulgaris L.) is one of the most consumed staple foods worldwide. Little is known about the molecular mechanisms controlling seed development. This study aims to comprehensively describe proteome dynamics during seed development of common bean. A high-throughput gel-free proteomics approach (LC-MS/MS) was conducted on seeds at 10, 20, 30 and 40days after anthesis, spanning from late embryogenesis until desiccation. Of the 418 differentially accumulated proteins identified, 255 were characterized, most belonging to protein metabolism. An accumulation of proteins belonging to the MapMan functional categories of "protein", "glycolysis", "TCA", "DNA", "RNA", "cell" and "stress" were found at early seed development stages, reflecting an extensive metabolic activity. In the mid stages, accumulation of storage, signaling, starch synthesis and cell wall-related proteins stood out. In the later stages, an increase in proteins related to redox, protein degradation/modification/folding and nucleic acid metabolisms reflect that seed desiccation-resistance mechanisms were activated. Our study unveils new clues to understand the regulation of seed development mediated by post-translational modifications and maintenance of genome integrity. This knowledge enhances the understanding on seed development molecular mechanisms that may be used in the design and selection of common bean seeds with desired quality traits. Common bean (P. vulgaris) is an important source of proteins and carbohydrates worldwide. Despite the agronomic and economic importance of this pulse, knowledge on common bean seed development is limited. Herein, a gel-free high throughput methodology was used to describe the proteome changes during P. vulgaris seed development. Data obtained will enhance the knowledge on the molecular mechanisms controlling this grain legume seed development and may be used in the design and selection of common bean seeds with desired quality traits. Results may be extrapolated to other pulses. Copyright © 2016 Elsevier B.V. All rights reserved.
Study of cellular oncometabolism via multidimensional protein identification technology.
Aukim-Hastie, Claire; Garbis, Spiros D
2014-01-01
Cellular proteomics is becoming a widespread clinical application, matching the definition of bench-to-bedside translation. Among various fields of investigation, this approach can be applied to the study of the metabolic alterations that accompany oncogenesis and tumor progression, which are globally referred to as oncometabolism. Here, we describe a multidimensional protein identification technology (MuDPIT)-based strategy that can be employed to study the cellular proteome of malignant cells and tissues. This method has previously been shown to be compatible with the reproducible, in-depth analysis of up to a thousand proteins in clinical samples. The possibility to employ this technique to study clinical specimens demonstrates its robustness. MuDPIT is advantageous as compared to other approaches because it is direct, highly sensitive, and reproducible, it provides high resolution with ultra-high mass accuracy, it allows for relative quantifications, and it is compatible with multiplexing (thus limiting costs).This method enables the direct assessment of the proteomic profile of neoplastic cells and tissues and could be employed in the near future as a high-throughput, rapid, quantitative, and cost-effective screening platform for clinical samples. © 2014 Elsevier Inc. All rights reserved.
Litichevskiy, Lev; Peckner, Ryan; Abelin, Jennifer G; Asiedu, Jacob K; Creech, Amanda L; Davis, John F; Davison, Desiree; Dunning, Caitlin M; Egertson, Jarrett D; Egri, Shawn; Gould, Joshua; Ko, Tak; Johnson, Sarah A; Lahr, David L; Lam, Daniel; Liu, Zihan; Lyons, Nicholas J; Lu, Xiaodong; MacLean, Brendan X; Mungenast, Alison E; Officer, Adam; Natoli, Ted E; Papanastasiou, Malvina; Patel, Jinal; Sharma, Vagisha; Toder, Courtney; Tubelli, Andrew A; Young, Jennie Z; Carr, Steven A; Golub, Todd R; Subramanian, Aravind; MacCoss, Michael J; Tsai, Li-Huei; Jaffe, Jacob D
2018-04-25
Although the value of proteomics has been demonstrated, cost and scale are typically prohibitive, and gene expression profiling remains dominant for characterizing cellular responses to perturbations. However, high-throughput sentinel assays provide an opportunity for proteomics to contribute at a meaningful scale. We present a systematic library resource (90 drugs × 6 cell lines) of proteomic signatures that measure changes in the reduced-representation phosphoproteome (P100) and changes in epigenetic marks on histones (GCP). A majority of these drugs elicited reproducible signatures, but notable cell line- and assay-specific differences were observed. Using the "connectivity" framework, we compared signatures across cell types and integrated data across assays, including a transcriptional assay (L1000). Consistent connectivity among cell types revealed cellular responses that transcended lineage, and consistent connectivity among assays revealed unexpected associations between drugs. We further leveraged the resource against public data to formulate hypotheses for treatment of multiple myeloma and acute lymphocytic leukemia. This resource is publicly available at https://clue.io/proteomics. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Durbin, Kenneth R.; Tran, John C.; Zamdborg, Leonid; Sweet, Steve M. M.; Catherman, Adam D.; Lee, Ji Eun; Li, Mingxi; Kellie, John F.; Kelleher, Neil L.
2011-01-01
Applying high-throughput Top-Down MS to an entire proteome requires a yet-to-be-established model for data processing. Since Top-Down is becoming possible on a large scale, we report our latest software pipeline dedicated to capturing the full value of intact protein data in automated fashion. For intact mass detection, we combine algorithms for processing MS1 data from both isotopically resolved (FT) and charge-state resolved (ion trap) LC-MS data, which are then linked to their fragment ions for database searching using ProSight. Automated determination of human keratin and tubulin isoforms is one result. Optimized for the intricacies of whole proteins, new software modules visualize proteome-scale data based on the LC retention time and intensity of intact masses and enable selective detection of PTMs to automatically screen for acetylation, phosphorylation, and methylation. Software functionality was demonstrated using comparative LC-MS data from yeast strains in addition to human cells undergoing chemical stress. We further these advances as a key aspect of realizing Top-Down MS on a proteomic scale. PMID:20848673
Proteomics: a new approach to the study of disease.
Chambers, G; Lawrie, L; Cash, P; Murray, G I
2000-11-01
The global analysis of cellular proteins has recently been termed proteomics and is a key area of research that is developing in the post-genome era. Proteomics uses a combination of sophisticated techniques including two-dimensional (2D) gel electrophoresis, image analysis, mass spectrometry, amino acid sequencing, and bio-informatics to resolve comprehensively, to quantify, and to characterize proteins. The application of proteomics provides major opportunities to elucidate disease mechanisms and to identify new diagnostic markers and therapeutic targets. This review aims to explain briefly the background to proteomics and then to outline proteomic techniques. Applications to the study of human disease conditions ranging from cancer to infectious diseases are reviewed. Finally, possible future advances are briefly considered, especially those which may lead to faster sample throughput and increased sensitivity for the detection of individual proteins. Copyright 2000 John Wiley & Sons, Ltd.
Li, Xiao-jun; Yi, Eugene C; Kemp, Christopher J; Zhang, Hui; Aebersold, Ruedi
2005-09-01
There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics.
MannDB: A microbial annotation database for protein characterization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C; Lam, M; Smith, J
2006-05-19
MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-sourcemore » tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports.« less
Analysis of Protein Expression in Cell Microarrays: A Tool for Antibody-based Proteomics
Andersson, Ann-Catrin; Strömberg, Sara; Bäckvall, Helena; Kampf, Caroline; Uhlen, Mathias; Wester, Kenneth; Pontén, Fredrik
2006-01-01
Tissue microarray (TMA) technology provides a possibility to explore protein expression patterns in a multitude of normal and disease tissues in a high-throughput setting. Although TMAs have been used for analysis of tissue samples, robust methods for studying in vitro cultured cell lines and cell aspirates in a TMA format have been lacking. We have adopted a technique to homogeneously distribute cells in an agarose gel matrix, creating an artificial tissue. This enables simultaneous profiling of protein expression in suspension- and adherent-grown cell samples assembled in a microarray. In addition, the present study provides an optimized strategy for the basic laboratory steps to efficiently produce TMAs. Presented modifications resulted in an improved quality of specimens and a higher section yield compared with standard TMA production protocols. Sections from the generated cell TMAs were tested for immunohistochemical staining properties using 20 well-characterized antibodies. Comparison of immunoreactivity in cultured dispersed cells and corresponding cells in tissue samples showed congruent results for all tested antibodies. We conclude that a modified TMA technique, including cell samples, provides a valuable tool for high-throughput analysis of protein expression, and that this technique can be used for global approaches to explore the human proteome. PMID:16957166
Säll, Anna; Walle, Maria; Wingren, Christer; Müller, Susanne; Nyman, Tomas; Vala, Andrea; Ohlin, Mats; Borrebaeck, Carl A K; Persson, Helena
2016-10-01
Antibody-based proteomics offers distinct advantages in the analysis of complex samples for discovery and validation of biomarkers associated with disease. However, its large-scale implementation requires tools and technologies that allow development of suitable antibody or antibody fragments in a high-throughput manner. To address this we designed and constructed two human synthetic antibody fragment (scFv) libraries denoted HelL-11 and HelL-13. By the use of phage display technology, in total 466 unique scFv antibodies specific for 114 different antigens were generated. The specificities of these antibodies were analyzed in a variety of immunochemical assays and a subset was further evaluated for functionality in protein microarray applications. This high-throughput approach demonstrates the ability to rapidly generate a wealth of reagents not only for proteome research, but potentially also for diagnostics and therapeutics. In addition, this work provides a great example on how a synthetic approach can be used to optimize library designs. By having precise control of the diversity introduced into the antigen-binding sites, synthetic libraries offer increased understanding of how different diversity contributes to antibody binding reactivity and stability, thereby providing the key to future library optimization. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Tuncbag, Nurcan; McCallum, Scott; Huang, Shao-shan Carol; Fraenkel, Ernest
2012-01-01
High-throughput technologies including transcriptional profiling, proteomics and reverse genetics screens provide detailed molecular descriptions of cellular responses to perturbations. However, it is difficult to integrate these diverse data to reconstruct biologically meaningful signaling networks. Previously, we have established a framework for integrating transcriptional, proteomic and interactome data by searching for the solution to the prize-collecting Steiner tree problem. Here, we present a web server, SteinerNet, to make this method available in a user-friendly format for a broad range of users with data from any species. At a minimum, a user only needs to provide a set of experimentally detected proteins and/or genes and the server will search for connections among these data from the provided interactomes for yeast, human, mouse, Drosophila melanogaster and Caenorhabditis elegans. More advanced users can upload their own interactome data as well. The server provides interactive visualization of the resulting optimal network and downloadable files detailing the analysis and results. We believe that SteinerNet will be useful for researchers who would like to integrate their high-throughput data for a specific condition or cellular response and to find biologically meaningful pathways. SteinerNet is accessible at http://fraenkel.mit.edu/steinernet. PMID:22638579
Advanced Mass Spectrometric Methods for the Rapid and Quantitative Characterization of Proteomes
Smith, Richard D.
2002-01-01
Progress is reviewedmore » towards the development of a global strategy that aims to extend the sensitivity, dynamic range, comprehensiveness and throughput of proteomic measurements based upon the use of high performance separations and mass spectrometry. The approach uses high accuracy mass measurements from Fourier transform ion cyclotron resonance mass spectrometry (FTICR) to validate peptide ‘accurate mass tags’ (AMTs) produced by global protein enzymatic digestions for a specific organism, tissue or cell type from ‘potential mass tags’ tentatively identified using conventional tandem mass spectrometry (MS/MS). This provides the basis for subsequent measurements without the need for MS/ MS. High resolution capillary liquid chromatography separations combined with high sensitivity, and high resolution accurate FTICR measurements are shown to be capable of characterizing peptide mixtures of more than 10 5 components. The strategy has been initially demonstrated using the microorganisms Saccharomyces cerevisiae and Deinococcus radiodurans. Advantages of the approach include the high confidence of protein identification, its broad proteome coverage, high sensitivity, and the capability for stableisotope labeling methods for precise relative protein abundance measurements. Abbreviations : LC, liquid chromatography; FTICR, Fourier transform ion cyclotron resonance; AMT, accurate mass tag; PMT, potential mass tag; MMA, mass measurement accuracy; MS, mass spectrometry; MS/MS, tandem mass spectrometry; ppm, parts per million.« less
Screening the Molecular Framework Underlying Local Dendritic mRNA Translation
Namjoshi, Sanjeev V.; Raab-Graham, Kimberly F.
2017-01-01
In the last decade, bioinformatic analyses of high-throughput proteomics and transcriptomics data have enabled researchers to gain insight into the molecular networks that may underlie lasting changes in synaptic efficacy. Development and utilization of these techniques have advanced the field of learning and memory significantly. It is now possible to move from the study of activity-dependent changes of a single protein to modeling entire network changes that require local protein synthesis. This data revolution has necessitated the development of alternative computational and statistical techniques to analyze and understand the patterns contained within. Thus, the focus of this review is to provide a synopsis of the journey and evolution toward big data techniques to address still unanswered questions regarding how synapses are modified to strengthen neuronal circuits. We first review the seminal studies that demonstrated the pivotal role played by local mRNA translation as the mechanism underlying the enhancement of enduring synaptic activity. In the interest of those who are new to the field, we provide a brief overview of molecular biology and biochemical techniques utilized for sample preparation to identify locally translated proteins using RNA sequencing and proteomics, as well as the computational approaches used to analyze these data. While many mRNAs have been identified, few have been shown to be locally synthesized. To this end, we review techniques currently being utilized to visualize new protein synthesis, a task that has proven to be the most difficult aspect of the field. Finally, we provide examples of future applications to test the physiological relevance of locally synthesized proteins identified by big data approaches. PMID:28286470
A high-throughput semi-automated preparation for filtered synaptoneurosomes.
Murphy, Kathryn M; Balsor, Justin; Beshara, Simon; Siu, Caitlin; Pinto, Joshua G A
2014-09-30
Synaptoneurosomes have become an important tool for studying synaptic proteins. The filtered synaptoneurosomes preparation originally developed by Hollingsworth et al. (1985) is widely used and is an easy method to prepare synaptoneurosomes. The hand processing steps in that preparation, however, are labor intensive and have become a bottleneck for current proteomic studies using synaptoneurosomes. For this reason, we developed new steps for tissue homogenization and filtration that transform the preparation of synaptoneurosomes to a high-throughput, semi-automated process. We implemented a standardized protocol with easy to follow steps for homogenizing multiple samples simultaneously using a FastPrep tissue homogenizer (MP Biomedicals, LLC) and then filtering all of the samples in centrifugal filter units (EMD Millipore, Corp). The new steps dramatically reduce the time to prepare synaptoneurosomes from hours to minutes, increase sample recovery, and nearly double enrichment for synaptic proteins. These steps are also compatible with biosafety requirements for working with pathogen infected brain tissue. The new high-throughput semi-automated steps to prepare synaptoneurosomes are timely technical advances for studies of low abundance synaptic proteins in valuable tissue samples. Copyright © 2014 Elsevier B.V. All rights reserved.
Heat-Responsive Photosynthetic and Signaling Pathways in Plants: Insight from Proteomics.
Wang, Xiaoli; Xu, Chenxi; Cai, Xiaofeng; Wang, Quanhua; Dai, Shaojun
2017-10-20
Heat stress is a major abiotic stress posing a serious threat to plants. Heat-responsive mechanisms in plants are complicated and fine-tuned. Heat signaling transduction and photosynthesis are highly sensitive. Therefore, a thorough understanding of the molecular mechanism in heat stressed-signaling transduction and photosynthesis is necessary to protect crop yield. Current high-throughput proteomics investigations provide more useful information for underlying heat-responsive signaling pathways and photosynthesis modulation in plants. Several signaling components, such as guanosine triphosphate (GTP)-binding protein, nucleoside diphosphate kinase, annexin, and brassinosteroid-insensitive I-kinase domain interacting protein 114, were proposed to be important in heat signaling transduction. Moreover, diverse protein patterns of photosynthetic proteins imply that the modulations of stomatal CO₂ exchange, photosystem II, Calvin cycle, ATP synthesis, and chlorophyll biosynthesis are crucial for plant heat tolerance.
The Scottish Structural Proteomics Facility: targets, methods and outputs.
Oke, Muse; Carter, Lester G; Johnson, Kenneth A; Liu, Huanting; McMahon, Stephen A; Yan, Xuan; Kerou, Melina; Weikart, Nadine D; Kadi, Nadia; Sheikh, Md Arif; Schmelz, Stefan; Dorward, Mark; Zawadzki, Michal; Cozens, Christopher; Falconer, Helen; Powers, Helen; Overton, Ian M; van Niekerk, C A Johannes; Peng, Xu; Patel, Prakash; Garrett, Roger A; Prangishvili, David; Botting, Catherine H; Coote, Peter J; Dryden, David T F; Barton, Geoffrey J; Schwarz-Linek, Ulrich; Challis, Gregory L; Taylor, Garry L; White, Malcolm F; Naismith, James H
2010-06-01
The Scottish Structural Proteomics Facility was funded to develop a laboratory scale approach to high throughput structure determination. The effort was successful in that over 40 structures were determined. These structures and the methods harnessed to obtain them are reported here. This report reflects on the value of automation but also on the continued requirement for a high degree of scientific and technical expertise. The efficiency of the process poses challenges to the current paradigm of structural analysis and publication. In the 5 year period we published ten peer-reviewed papers reporting structural data arising from the pipeline. Nevertheless, the number of structures solved exceeded our ability to analyse and publish each new finding. By reporting the experimental details and depositing the structures we hope to maximize the impact of the project by allowing others to follow up the relevant biology.
Jaimes-Becerra, Adrian; Chung, Ray; Morandini, André C; Weston, Andrew J; Padilla, Gabriel; Gacesa, Ranko; Ward, Malcolm; Long, Paul F; Marques, Antonio C
2017-10-01
Cnidarians are probably the oldest group of animals to be venomous, yet our current picture of cnidarian venom evolution is highly imbalanced due to limited taxon sampling. High-throughput tandem mass spectrometry was used to determine venom composition of the scyphozoan Chrysaora lactea and two cubozoans Tamoya haplonema and Chiropsalmus quadrumanus. Protein recruitment patterns were then compared against 5 other cnidarian venom proteomes taken from the literature. A total of 28 putative toxin protein families were identified, many for the first time in Cnidaria. Character mapping analysis revealed that 17 toxin protein families with predominantly cytolytic biological activities were likely recruited into the cnidarian venom proteome before the lineage split between Anthozoa and Medusozoa. Thereafter, venoms of Medusozoa and Anthozoa differed during subsequent divergence of cnidarian classes. Recruitment and loss of toxin protein families did not correlate with accepted phylogenetic patterns of Cnidaria. Selective pressures that drive toxin diversification independent of taxonomic positioning have yet to be identified in Cnidaria and now warrant experimental consideration. Copyright © 2017 Elsevier Ltd. All rights reserved.
2017-01-01
Mass-spectrometry-based, high-throughput proteomics experiments produce large amounts of data. While typically acquired to answer specific biological questions, these data can also be reused in orthogonal ways to reveal new biological knowledge. We here present a novel method for such orthogonal data reuse of public proteomics data. Our method elucidates biological relationships between proteins based on the co-occurrence of these proteins across human experiments in the PRIDE database. The majority of the significantly co-occurring protein pairs that were detected by our method have been successfully mapped to existing biological knowledge. The validity of our novel method is substantiated by the extremely few pairs that can be mapped to existing knowledge based on random associations between the same set of proteins. Moreover, using literature searches and the STRING database, we were able to derive meaningful biological associations for unannotated protein pairs that were detected using our method, further illustrating that as-yet unknown associations present highly interesting targets for follow-up analysis. PMID:28480704
Azpiazu, Rubén; Amaral, Alexandra; Castillo, Judit; Estanyol, Josep Maria; Guimerà, Marta; Ballescà, Josep Lluís; Balasch, Juan; Oliva, Rafael
2014-06-01
Are there quantitative alterations in the proteome of normozoospermic sperm samples that are able to complete IVF but whose female partner does not achieve pregnancy? Normozoospermic sperm samples with different IVF outcomes (pregnancy versus no pregnancy) differed in the levels of at least 66 proteins. The analysis of the proteome of sperm samples with distinct fertilization capacity using low-throughput proteomic techniques resulted in the detection of a few differential proteins. Current high-throughput mass spectrometry approaches allow the identification and quantification of a substantially higher number of proteins. This was a case-control study including 31 men with normozoospermic sperm and their partners who underwent IVF with successful fertilization recruited between 2007 and 2008. Normozoospermic sperm samples from 15 men whose female partners did not achieve pregnancy after IVF (no pregnancy) and 16 men from couples that did achieve pregnancy after IVF (pregnancy) were included in this study. To perform the differential proteomic experiments, 10 no pregnancy samples and 10 pregnancy samples were separately pooled and subsequently used for tandem mass tags (TMT) protein labelling, sodium dodecyl sulphate-polyacrylamide gel electrophoresis, liquid chromatography tandem mass spectrometry (LC-MS/MS) identification and peak intensity relative protein quantification. Bioinformatic analyses were performed using UniProt Knowledgebase, DAVID and Reactome. Individual samples (n = 5 no pregnancy samples; n = 6 pregnancy samples) and aliquots from the above TMT pools were used for western blotting. By using TMT labelling and LC-MS/MS, we have detected 31 proteins present at lower abundance (ratio no pregnancy/pregnancy < 0.67) and 35 at higher abundance (ratio no pregnancy/pregnancy > 1.5) in the no pregnancy group. Bioinformatic analyses showed that the proteins with differing abundance are involved in chromatin assembly and lipoprotein metabolism (P values < 0.05). In addition, the differential abundance of one of the proteins (SRSF protein kinase 1) was further validated by western blotting using independent samples (P value < 0.01). For individual samples the amount of recovered sperm not used for IVF was low and in most of the cases insufficient for MS analysis, therefore pools of samples had to be used to this end. Alterations in the proteins involved in chromatin assembly and metabolism may result in epigenetic errors during spermatogenesis, leading to inaccurate sperm epigenetic signatures, which could ultimately prevent embryonic development. These sperm proteins may thus possibly have clinical relevance. This work was supported by the Spanish Ministry of Economy and Competitiveness (Ministerio de Economia y Competividad; FEDER BFU 2009-07118 and PI13/00699) and Fundación Salud 2000 SERONO13-015. There are no competing interests to declare.
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-01-01
Abstract Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure–property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure–property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure–property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials. PMID:28458737
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.
2015-01-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L
2015-02-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Zhang, Aihua; Zhou, Xiaohang; Zhao, Hongwei; Zou, Shiyu; Ma, Chung Wah; Liu, Qi; Sun, Hui; Liu, Liang; Wang, Xijun
2017-01-31
An integrative metabolomics and proteomics approach can provide novel insights in the understanding of biological systems. We have integrated proteome and metabolome data sets for a holistic view of the molecular mechanisms in disease. Using quantitative iTRAQ-LC-MS/MS proteomics coupled with UPLC-Q-TOF-HDMS based metabolomics, we determined the protein and metabolite expression changes in the kidney-yang deficiency syndrome (KYDS) rat model and further investigated the intervention effects of the Jinkui Shenqi Pill (JSP). The VIP-plot of the orthogonal PLS-DA (OPLS-DA) was used for discovering the potential biomarkers to clarify the therapeutic mechanisms of JSP in treating KYDS. The results showed that JSP can alleviate the kidney impairment induced by KYDS. Sixty potential biomarkers, including 5-l-glutamyl-taurine, phenylacetaldehyde, 4,6-dihydroxyquinoline, and xanthurenic acid etc., were definitely up- or down-regulated. The regulatory effect of JSP on the disturbed metabolic pathways was proved by the established metabonomic method. Using pathway analyses, we identified the disturbed metabolic pathways such as taurine and hypotaurine metabolism, pyrimidine metabolism, tyrosine metabolism, tryptophan metabolism, histidine metabolism, steroid hormone biosynthesis, etc. Furthermore, using iTRAQ-based quantitative proteomics analysis, seventeen differential proteins were identified and significantly altered by the JSP treatment. These proteins appear to be involved in Wnt, chemokine, PPAR, and MAPK signaling pathways, etc. Functional pathway analysis revealed that most of the proteins were found to play a key role in the regulation of metabolism pathways. Bioinformatics analysis with the IPA software found that these differentially-expressed moleculars had a strong correlation with the α-adrenergic signaling, FGF signaling, etc. Our data indicate that high-throughput metabolomics and proteomics can provide an insight on the herbal preparations affecting the metabolic disorders using high resolution mass spectrometry.
Comparison of Normal and Breast Cancer Cell lines using Proteome, Genome and Interactome data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patwardhan, Anil J.; Strittmatter, Eric F.; Camp, David G.
2005-12-01
Normal and cancer cell line proteomes were profiled using high throughput mass spectrometry techniques. Application of both protein-level and peptide-level sample fractionation combined with LC-MS/MS analysis enabled the confident identification of 2,235 unmodified proteins representing a broad range of functional and compartmental classes. An iterative multi-step search strategy was used to identify post-translational modifications and detected several proteins that are preferentially modified in cancer cells. Information regarding both unmodified and modified protein forms was combined with publicly available gene expression and protein-protein interaction data. The resulting integrated dataset revealed several functionally related proteins that are differentially regulated between normal andmore » cancer cell lines.« less
Akeroyd, Michiel; Olsthoorn, Maurien; Gerritsma, Jort; Gutker-Vermaas, Diana; Ekkelkamp, Laurens; van Rij, Tjeerd; Klaassen, Paul; Plugge, Wim; Smit, Ed; Strupat, Kerstin; Wenzel, Thibaut; van Tilborg, Marcel; van der Hoeven, Rob
2013-03-10
In the discovery of new enzymes genomic and cDNA expression libraries containing thousands of differential clones are generated to obtain biodiversity. These libraries need to be screened for the activity of interest. Removing so-called empty and redundant clones significantly reduces the size of these expression libraries and therefore speeds up new enzyme discovery. Here, we present a sensitive, generic workflow for high throughput screening of successful microbial protein over-expression in microtiter plates containing a complex matrix based on mass spectrometry techniques. MALDI-LTQ-Orbitrap screening followed by principal component analysis and peptide mass fingerprinting was developed to obtain a throughput of ∼12,000 samples per week. Alternatively, a UHPLC-MS(2) approach including MS(2) protein identification was developed for microorganisms with a complex protein secretome with a throughput of ∼2000 samples per week. TCA-induced protein precipitation enhanced by addition of bovine serum albumin is used for protein purification prior to MS detection. We show that this generic workflow can effectively reduce large expression libraries from fungi and bacteria to their minimal size by detection of successful protein over-expression using MS. Copyright © 2012 Elsevier B.V. All rights reserved.
Guo, Jinju; Wang, Peng; Cheng, Qing; Sun, Limin; Wang, Hongyu; Wang, Yutong; Kao, Lina; Li, Yanan; Qiu, Tuoyu; Yang, Wencai; Shen, Huolin
2017-09-25
Although cytoplasmic male sterility (CMS) is widely used for developing pepper hybrids, its molecular mechanism remains unclear. In this study, we used a high-throughput proteomics method called label-free to compare protein abundance across a pepper CMS line (A-line) and its isogenic maintainer line (B-line). Data are available via ProteomeXchange with identifier PXD006104. Approximately 324 differentially abundant protein species were identified and quantified; among which, 47 were up-accumulated and 140 were down-accumulated in the A-line; additionally, 75 and 62 protein species were specifically accumulated in the A-line and B-line, respectively. Protein species involved in pollen exine formation, pyruvate metabolic processes, the tricarboxylic acid cycle, the mitochondrial electron transport chain, and oxidative stress response were observed to be differentially accumulated between A-line and B-line, suggesting their potential roles in the regulation of pepper pollen abortion. Based on our data, we proposed a potential regulatory network for pepper CMS that unifies these processes. Artificial emasculation is a major obstacle in pepper hybrid breeding for its high labor cost and poor seed purity. While the use of cytoplasmic male sterility (CMS) in hybrid system is seriously frustrated because a long time is needed to cultivate male sterility line and its isogenic restore line. Transgenic technology is an effective and rapid method to obtain male sterility lines and its widely application has very important significance in speeding up breeding process in pepper. Although numerous studies have been conducted to select the genes related to male sterility, the molecular mechanism of cytoplasmic male sterility in pepper remains unknown. In this study, we used the high-throughput proteomic method called "label-free", coupled with liquid chromatography-quadrupole mass spectrometry (LC-MS/MS), to perform a novel comparison of expression profiles in a CMS pepper line and its maintainer line. Based on our results, we proposed a potential regulated protein network involved in pollen development as a novel mechanism of pepper CMS. Copyright © 2017. Published by Elsevier B.V.
Das, Abhiram; Schneider, Hannah; Burridge, James; Ascanio, Ana Karine Martinez; Wojciechowski, Tobias; Topp, Christopher N; Lynch, Jonathan P; Weitz, Joshua S; Bucksch, Alexander
2015-01-01
Plant root systems are key drivers of plant function and yield. They are also under-explored targets to meet global food and energy demands. Many new technologies have been developed to characterize crop root system architecture (CRSA). These technologies have the potential to accelerate the progress in understanding the genetic control and environmental response of CRSA. Putting this potential into practice requires new methods and algorithms to analyze CRSA in digital images. Most prior approaches have solely focused on the estimation of root traits from images, yet no integrated platform exists that allows easy and intuitive access to trait extraction and analysis methods from images combined with storage solutions linked to metadata. Automated high-throughput phenotyping methods are increasingly used in laboratory-based efforts to link plant genotype with phenotype, whereas similar field-based studies remain predominantly manual low-throughput. Here, we present an open-source phenomics platform "DIRT", as a means to integrate scalable supercomputing architectures into field experiments and analysis pipelines. DIRT is an online platform that enables researchers to store images of plant roots, measure dicot and monocot root traits under field conditions, and share data and results within collaborative teams and the broader community. The DIRT platform seamlessly connects end-users with large-scale compute "commons" enabling the estimation and analysis of root phenotypes from field experiments of unprecedented size. DIRT is an automated high-throughput computing and collaboration platform for field based crop root phenomics. The platform is accessible at http://www.dirt.iplantcollaborative.org/ and hosted on the iPlant cyber-infrastructure using high-throughput grid computing resources of the Texas Advanced Computing Center (TACC). DIRT is a high volume central depository and high-throughput RSA trait computation platform for plant scientists working on crop roots. It enables scientists to store, manage and share crop root images with metadata and compute RSA traits from thousands of images in parallel. It makes high-throughput RSA trait computation available to the community with just a few button clicks. As such it enables plant scientists to spend more time on science rather than on technology. All stored and computed data is easily accessible to the public and broader scientific community. We hope that easy data accessibility will attract new tool developers and spur creative data usage that may even be applied to other fields of science.
Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.
2015-01-01
We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262
Runau, Franscois; Arshad, Ali; Isherwood, John; Norris, Leonie; Howells, Lynne; Metcalfe, Matthew; Dennison, Ashley
2015-06-01
Pancreatic cancer is a disease with a significantly poor prognosis. Despite modern advances in other medical, surgical, and oncologic therapy, the outcome from pancreatic cancer has improved little over the last 40 years. To improve the management of this difficult disease, trials investigating the use of dietary and parenteral fish oils rich in omega-3 (ω-3) fatty acids, exhibiting proven anti-inflammatory and anticarcinogenic properties, have revealed favorable results in pancreatic cancers. Proteomics is the large-scale study of proteins that attempts to characterize the complete set of proteins encoded by the genome of an organism and that, with the use of sensitive mass spectrometric-based techniques, has allowed high-throughput analysis of the proteome to aid identification of putative biomarkers pertinent to given disease states. These biomarkers provide useful insight into potentially discovering new markers for early detection or elucidating the efficacy of treatment on pancreatic cancers. Here, our review identifies potential proteomic-based biomarkers in pancreatic cancer relating to apoptosis, cell proliferation, angiogenesis, and metabolic regulation in clinical studies. We also reviewed proteomic biomarkers from the administration of ω-3 fatty acids that act on similar anticarcinogenic pathways as above and reflect that proteomic studies on the effect of ω-3 fatty acids in pancreatic cancer will yield favorable results. © 2015 American Society for Parenteral and Enteral Nutrition.
Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R
2015-02-01
We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Quantitative trait loci mapping of the mouse plasma proteome (pQTL).
Holdt, Lesca M; von Delft, Annette; Nicolaou, Alexandros; Baumann, Sven; Kostrzewa, Markus; Thiery, Joachim; Teupser, Daniel
2013-02-01
A current challenge in the era of genome-wide studies is to determine the responsible genes and mechanisms underlying newly identified loci. Screening of the plasma proteome by high-throughput mass spectrometry (MALDI-TOF MS) is considered a promising approach for identification of metabolic and disease processes. Therefore, plasma proteome screening might be particularly useful for identifying responsible genes when combined with analysis of variation in the genome. Here, we describe a proteomic quantitative trait locus (pQTL) study of plasma proteome screens in an F(2) intercross of 455 mice mapped with 177 genetic markers across the genome. A total of 69 of 176 peptides revealed significant LOD scores (≥5.35) demonstrating strong genetic regulation of distinct components of the plasma proteome. Analyses were confirmed by mechanistic studies and MALDI-TOF/TOF, liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of the two strongest pQTLs: A pQTL for mass-to-charge ratio (m/z) 3494 (LOD 24.9, D11Mit151) was identified as the N-terminal 35 amino acids of hemoglobin subunit A (Hba) and caused by genetic variation in Hba. Another pQTL for m/z 8713 (LOD 36.4; D1Mit111) was caused by variation in apolipoprotein A2 (Apoa2) and cosegregated with HDL cholesterol. Taken together, we show that genome-wide plasma proteome profiling in combination with genome-wide genetic screening aids in the identification of causal genetic variants affecting abundance of plasma proteins.
Quantitative Trait Loci Mapping of the Mouse Plasma Proteome (pQTL)
Holdt, Lesca M.; von Delft, Annette; Nicolaou, Alexandros; Baumann, Sven; Kostrzewa, Markus; Thiery, Joachim; Teupser, Daniel
2013-01-01
A current challenge in the era of genome-wide studies is to determine the responsible genes and mechanisms underlying newly identified loci. Screening of the plasma proteome by high-throughput mass spectrometry (MALDI-TOF MS) is considered a promising approach for identification of metabolic and disease processes. Therefore, plasma proteome screening might be particularly useful for identifying responsible genes when combined with analysis of variation in the genome. Here, we describe a proteomic quantitative trait locus (pQTL) study of plasma proteome screens in an F2 intercross of 455 mice mapped with 177 genetic markers across the genome. A total of 69 of 176 peptides revealed significant LOD scores (≥5.35) demonstrating strong genetic regulation of distinct components of the plasma proteome. Analyses were confirmed by mechanistic studies and MALDI-TOF/TOF, liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of the two strongest pQTLs: A pQTL for mass-to-charge ratio (m/z) 3494 (LOD 24.9, D11Mit151) was identified as the N-terminal 35 amino acids of hemoglobin subunit A (Hba) and caused by genetic variation in Hba. Another pQTL for m/z 8713 (LOD 36.4; D1Mit111) was caused by variation in apolipoprotein A2 (Apoa2) and cosegregated with HDL cholesterol. Taken together, we show that genome-wide plasma proteome profiling in combination with genome-wide genetic screening aids in the identification of causal genetic variants affecting abundance of plasma proteins. PMID:23172855
Székely, Andrea; Szekrényes, Akos; Kerékgyártó, Márta; Balogh, Attila; Kádas, János; Lázár, József; Guttman, András; Kurucz, István; Takács, László
2014-08-01
Molecular heterogeneity of mAb preparations is the result of various co- and post-translational modifications and to contaminants related to the production process. Changes in molecular composition results in alterations of functional performance, therefore quality control and validation of therapeutic or diagnostic protein products is essential. A special case is the consistent production of mAb libraries (QuantiPlasma™ and PlasmaScan™) for proteome profiling, quality control of which represents a challenge because of high number of mAbs (>1000). Here, we devise a generally applicable multicapillary SDS-gel electrophoresis process for the analysis of fluorescently labeled mAb preparations for the high throughput quality control of mAbs of the QuantiPlasma™ and PlasmaScan™ libraries. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Common bean proteomics: Present status and future strategies.
Zargar, Sajad Majeed; Mahajan, Reetika; Nazir, Muslima; Nagar, Preeti; Kim, Sun Tae; Rai, Vandna; Masi, Antonio; Ahmad, Syed Mudasir; Shah, Riaz Ahmad; Ganai, Nazir Ahmad; Agrawal, Ganesh K; Rakwal, Randeep
2017-10-03
Common bean (Phaseolus vulgaris L.) is a legume of appreciable importance and usefulness worldwide to the human population providing food and feed. It is rich in high-quality protein, energy, fiber and micronutrients especially iron, zinc, and pro-vitamin A; and possesses potentially disease-preventing and health-promoting compounds. The recently published genome sequence of common bean is an important landmark in common bean research, opening new avenues for understanding its genetics in depth. This legume crop is affected by diverse biotic and abiotic stresses severely limiting its productivity. Looking at the trend of increasing world population and the need for food crops best suited to the health of humankind, the legumes will be in great demand, including the common bean mostly for its nutritive values. Hence the need for new research in understanding the biology of this crop brings us to utilize and apply high-throughput omics approaches. In this mini-review our focus will be on the need for proteomics studies in common bean, potential of proteomics for understanding genetic regulation under abiotic and biotic stresses and how proteogenomics will lead to nutritional improvement. We will also discuss future proteomics-based strategies that must be adopted to mine new genomic resources by identifying molecular switches regulating various biological processes. Common bean is regarded as "grain of hope" for the poor, being rich in high-quality protein, energy, fiber and micronutrients (iron, zinc, pro-vitamin A); and possesses potentially disease-preventing and health-promoting compounds. Increasing world population and the need for food crops best suited to the health of humankind, puts legumes into great demand, which includes the common bean mostly. An important landmark in common bean research was the recent publication of its genome sequence, opening new avenues for understanding its genetics in depth. This legume crop is affected by diverse biotic and abiotic stresses severely limiting its productivity. Therefore, the need for new research in understanding the biology of this crop brings us to utilize and apply high-throughput omics approaches. Proteomics can be used to track all the candidate proteins/genes responsible for a biological process under specific conditions in a particular tissue. The potential of proteomics will not only help in determining the functions of a large number of genes in a single experiment but will also be a useful tool to mine new genes that can provide solution to various problems (abiotic stress, biotic stress, nutritional improvement, etc). We believe that a combined approach including breeding along with omics tools will lead towards attaining sustainability in legumes, including common bean. Copyright © 2017 Elsevier B.V. All rights reserved.
Chen, Yunjia; Qiu, Shihong; Luan, Chi-Hao; Luo, Ming
2007-01-01
Background Expression of higher eukaryotic genes as soluble, stable recombinant proteins is still a bottleneck step in biochemical and structural studies of novel proteins today. Correct identification of stable domains/fragments within the open reading frame (ORF), combined with proper cloning strategies, can greatly enhance the success rate when higher eukaryotic proteins are expressed as these domains/fragments. Furthermore, a HTP cloning pipeline incorporated with bioinformatics domain/fragment selection methods will be beneficial to studies of structure and function genomics/proteomics. Results With bioinformatics tools, we developed a domain/domain boundary prediction (DDBP) method, which was trained by available experimental data. Combined with an improved cloning strategy, DDBP had been applied to 57 proteins from C. elegans. Expression and purification results showed there was a 10-fold increase in terms of obtaining purified proteins. Based on the DDBP method, the improved GATEWAY cloning strategy and a robotic platform, we constructed a high throughput (HTP) cloning pipeline, including PCR primer design, PCR, BP reaction, transformation, plating, colony picking and entry clones extraction, which have been successfully applied to 90 C. elegans genes, 88 Brucella genes, and 188 human genes. More than 97% of the targeted genes were obtained as entry clones. This pipeline has a modular design and can adopt different operations for a variety of cloning/expression strategies. Conclusion The DDBP method and improved cloning strategy were satisfactory. The cloning pipeline, combined with our recombinant protein HTP expression pipeline and the crystal screening robots, constitutes a complete platform for structure genomics/proteomics. This platform will increase the success rate of purification and crystallization dramatically and promote the further advancement of structure genomics/proteomics. PMID:17663785
Condor-COPASI: high-throughput computing for biochemical networks
2012-01-01
Background Mathematical modelling has become a standard technique to improve our understanding of complex biological systems. As models become larger and more complex, simulations and analyses require increasing amounts of computational power. Clusters of computers in a high-throughput computing environment can help to provide the resources required for computationally expensive model analysis. However, exploiting such a system can be difficult for users without the necessary expertise. Results We present Condor-COPASI, a server-based software tool that integrates COPASI, a biological pathway simulation tool, with Condor, a high-throughput computing environment. Condor-COPASI provides a web-based interface, which makes it extremely easy for a user to run a number of model simulation and analysis tasks in parallel. Tasks are transparently split into smaller parts, and submitted for execution on a Condor pool. Result output is presented to the user in a number of formats, including tables and interactive graphical displays. Conclusions Condor-COPASI can effectively use a Condor high-throughput computing environment to provide significant gains in performance for a number of model simulation and analysis tasks. Condor-COPASI is free, open source software, released under the Artistic License 2.0, and is suitable for use by any institution with access to a Condor pool. Source code is freely available for download at http://code.google.com/p/condor-copasi/, along with full instructions on deployment and usage. PMID:22834945
Accelerating the design of solar thermal fuel materials through high throughput simulations.
Liu, Yun; Grossman, Jeffrey C
2014-12-10
Solar thermal fuels (STF) store the energy of sunlight, which can then be released later in the form of heat, offering an emission-free and renewable solution for both solar energy conversion and storage. However, this approach is currently limited by the lack of low-cost materials with high energy density and high stability. In this Letter, we present an ab initio high-throughput computational approach to accelerate the design process and allow for searches over a broad class of materials. The high-throughput screening platform we have developed can run through large numbers of molecules composed of earth-abundant elements and identifies possible metastable structures of a given material. Corresponding isomerization enthalpies associated with the metastable structures are then computed. Using this high-throughput simulation approach, we have discovered molecular structures with high isomerization enthalpies that have the potential to be new candidates for high-energy density STF. We have also discovered physical principles to guide further STF materials design through structural analysis. More broadly, our results illustrate the potential of using high-throughput ab initio simulations to design materials that undergo targeted structural transitions.
Proteomic Analyses of the Unexplored Sea Anemone Bunodactis verrucosa
Campos, Alexandre; Turkina, Maria V.; Ribeiro, Tiago; Osorio, Hugo; Vasconcelos, Vítor; Antunes, Agostinho
2018-01-01
Cnidarian toxic products, particularly peptide toxins, constitute a promising target for biomedicine research. Indeed, cnidarians are considered as the largest phylum of generally toxic animals. However, research on peptides and toxins of sea anemones is still limited. Moreover, most of the toxins from sea anemones have been discovered by classical purification approaches. Recently, high-throughput methodologies have been used for this purpose but in other Phyla. Hence, the present work was focused on the proteomic analyses of whole-body extract from the unexplored sea anemone Bunodactis verrucosa. The proteomic analyses applied were based on two methods: two-dimensional gel electrophoresis combined with MALDI-TOF/TOF and shotgun proteomic approach. In total, 413 proteins were identified, but only eight proteins were identified from gel-based analyses. Such proteins are mainly involved in basal metabolism and biosynthesis of antibiotics as the most relevant pathways. In addition, some putative toxins including metalloproteinases and neurotoxins were also identified. These findings reinforce the significance of the production of antimicrobial compounds and toxins by sea anemones, which play a significant role in defense and feeding. In general, the present study provides the first proteome map of the sea anemone B. verrucosa stablishing a reference for future studies in the discovery of new compounds. PMID:29364843
Proteomic Analyses of the Unexplored Sea Anemone Bunodactis verrucosa.
Domínguez-Pérez, Dany; Campos, Alexandre; Alexei Rodríguez, Armando; Turkina, Maria V; Ribeiro, Tiago; Osorio, Hugo; Vasconcelos, Vítor; Antunes, Agostinho
2018-01-24
Cnidarian toxic products, particularly peptide toxins, constitute a promising target for biomedicine research. Indeed, cnidarians are considered as the largest phylum of generally toxic animals. However, research on peptides and toxins of sea anemones is still limited. Moreover, most of the toxins from sea anemones have been discovered by classical purification approaches. Recently, high-throughput methodologies have been used for this purpose but in other Phyla. Hence, the present work was focused on the proteomic analyses of whole-body extract from the unexplored sea anemone Bunodactis verrucosa . The proteomic analyses applied were based on two methods: two-dimensional gel electrophoresis combined with MALDI-TOF/TOF and shotgun proteomic approach. In total, 413 proteins were identified, but only eight proteins were identified from gel-based analyses. Such proteins are mainly involved in basal metabolism and biosynthesis of antibiotics as the most relevant pathways. In addition, some putative toxins including metalloproteinases and neurotoxins were also identified. These findings reinforce the significance of the production of antimicrobial compounds and toxins by sea anemones, which play a significant role in defense and feeding. In general, the present study provides the first proteome map of the sea anemone B. verrucosa stablishing a reference for future studies in the discovery of new compounds.
Sharma, Mukut; Halligan, Brian D; Wakim, Bassam T; Savin, Virginia J; Cohen, Eric P; Moulder, John E
2008-06-18
Terrorist attacks or nuclear accidents could expose large numbers of people to ionizing radiation, and early biomarkers of radiation injury would be critical for triage, treatment and follow-up of such individuals. However, no such biomarkers have yet been proven to exist. We tested the potential of high throughput proteomics to identify protein biomarkers of radiation injury after total body X-ray irradiation in a rat model. Subtle functional changes in the kidney are suggested by an increased glomerular permeability for macromolecules measured within 24 hours after TBI. Ultrastructural changes in glomerular podocytes include partial loss of the interdigitating organization of foot processes. Analysis of urine by LC-MS/MS and 2D-GE showed significant changes in the urine proteome within 24 hours after TBI. Tissue kallikrein 1-related peptidase, cysteine proteinase inhibitor cystatin C and oxidized histidine were found to be increased while a number of proteinase inhibitors including kallikrein-binding protein and albumin were found to be decreased post-irradiation. Thus, TBI causes immediately detectable changes in renal structure and function and in the urinary protein profile. This suggests that both systemic and renal changes are induced by radiation and it may be possible to identify a set of biomarkers unique to radiation injury.
Huang, Shao-shan Carol; Clarke, David C.; Gosline, Sara J. C.; Labadorf, Adam; Chouinard, Candace R.; Gordon, William; Lauffenburger, Douglas A.; Fraenkel, Ernest
2013-01-01
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets. PMID:23408876
Diagnostic Peptide Discovery: Prioritization of Pathogen Diagnostic Markers Using Multiple Features
Carmona, Santiago J.; Sartor, Paula A.; Leguizamón, María S.; Campetella, Oscar E.; Agüero, Fernán
2012-01-01
The availability of complete pathogen genomes has renewed interest in the development of diagnostics for infectious diseases. Synthetic peptide microarrays provide a rapid, high-throughput platform for immunological testing of potential B-cell epitopes. However, their current capacity prevent the experimental screening of complete “peptidomes”. Therefore, computational approaches for prediction and/or prioritization of diagnostically relevant peptides are required. In this work we describe a computational method to assess a defined set of molecular properties for each potential diagnostic target in a reference genome. Properties such as sub-cellular localization or expression level were evaluated for the whole protein. At a higher resolution (short peptides), we assessed a set of local properties, such as repetitive motifs, disorder (structured vs natively unstructured regions), trans-membrane spans, genetic polymorphisms (conserved vs. divergent regions), predicted B-cell epitopes, and sequence similarity against human proteins and other potential cross-reacting species (e.g. other pathogens endemic in overlapping geographical locations). A scoring function based on these different features was developed, and used to rank all peptides from a large eukaryotic pathogen proteome. We applied this method to the identification of candidate diagnostic peptides in the protozoan Trypanosoma cruzi, the causative agent of Chagas disease. We measured the performance of the method by analyzing the enrichment of validated antigens in the high-scoring top of the ranking. Based on this measure, our integrative method outperformed alternative prioritizations based on individual properties (such as B-cell epitope predictors alone). Using this method we ranked 10 million 12-mer overlapping peptides derived from the complete T. cruzi proteome. Experimental screening of 190 high-scoring peptides allowed the identification of 37 novel epitopes with diagnostic potential, while none of the low scoring peptides showed significant reactivity. Many of the metrics employed are dependent on standard bioinformatic tools and data, so the method can be easily extended to other pathogen genomes. PMID:23272069
Proteomic analysis of formalin-fixed paraffin embedded tissue by MALDI imaging mass spectrometry
Casadonte, Rita; Caprioli, Richard M
2012-01-01
Archived formalin-fixed paraffin-embedded (FFPE) tissue collections represent a valuable informational resource for proteomic studies. Multiple FFPE core biopsies can be assembled in a single block to form tissue microarrays (TMAs). We describe a protocol for analyzing protein in FFPE -TMAs using matrix-assisted laser desorption/ionization (MAL DI) imaging mass spectrometry (IMS). The workflow incorporates an antigen retrieval step following deparaffinization, in situ trypsin digestion, matrix application and then mass spectrometry signal acquisition. The direct analysis of FFPE -TMA tissue using IMS allows direct analysis of multiple tissue samples in a single experiment without extraction and purification of proteins. The advantages of high speed and throughput, easy sample handling and excellent reproducibility make this technology a favorable approach for the proteomic analysis of clinical research cohorts with large sample numbers. For example, TMA analysis of 300 FFPE cores would typically require 6 h of total time through data acquisition, not including data analysis. PMID:22011652
Berger, Sebastian T; Ahmed, Saima; Muntel, Jan; Cuevas Polo, Nerea; Bachur, Richard; Kentsis, Alex; Steen, Judith; Steen, Hanno
2015-10-01
We describe a 96-well plate compatible membrane-based proteomic sample processing method, which enables the complete processing of 96 samples (or multiples thereof) within a single workday. This method uses a large-pore hydrophobic PVDF membrane that efficiently adsorbs proteins, resulting in fast liquid transfer through the membrane and significantly reduced sample processing times. Low liquid transfer speeds have prevented the useful 96-well plate implementation of FASP as a widely used membrane-based proteomic sample processing method. We validated our approach on whole-cell lysate and urine and cerebrospinal fluid as clinically relevant body fluids. Without compromising peptide and protein identification, our method uses a vacuum manifold and circumvents the need for digest desalting, making our processing method compatible with standard liquid handling robots. In summary, our new method maintains the strengths of FASP and simultaneously overcomes one of the major limitations of FASP without compromising protein identification and quantification. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Proteomic evaluation of genetically modified crops: current status and challenges
Gong, Chun Yan; Wang, Tai
2013-01-01
Hectares of genetically modified (GM) crops have increased exponentially since 1996, when such crops began to be commercialized. GM biotechnology, together with conventional breeding, has become the main approach to improving agronomic traits of crops. However, people are concerned about the safety of GM crops, especially GM-derived food and feed. Many efforts have been made to evaluate the unintended effects caused by the introduction of exogenous genes. “Omics” techniques have advantages over targeted analysis in evaluating such crops because of their use of high-throughput screening. Proteins are key players in gene function and are directly involved in metabolism and cellular development or have roles as toxins, antinutrients, or allergens, which are essential for human health. Thus, proteomics can be expected to become one of the most useful tools in safety assessment. This review assesses the potential of proteomics in evaluating various GM crops. We further describe the challenges in ensuring homogeneity and sensitivity in detection techniques. PMID:23471542
Proteomic evaluation of genetically modified crops: current status and challenges.
Gong, Chun Yan; Wang, Tai
2013-01-01
Hectares of genetically modified (GM) crops have increased exponentially since 1996, when such crops began to be commercialized. GM biotechnology, together with conventional breeding, has become the main approach to improving agronomic traits of crops. However, people are concerned about the safety of GM crops, especially GM-derived food and feed. Many efforts have been made to evaluate the unintended effects caused by the introduction of exogenous genes. "Omics" techniques have advantages over targeted analysis in evaluating such crops because of their use of high-throughput screening. Proteins are key players in gene function and are directly involved in metabolism and cellular development or have roles as toxins, antinutrients, or allergens, which are essential for human health. Thus, proteomics can be expected to become one of the most useful tools in safety assessment. This review assesses the potential of proteomics in evaluating various GM crops. We further describe the challenges in ensuring homogeneity and sensitivity in detection techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Springer, David L.; Ahram, Mamoun; Adkins, Joshua N.
Shedding, the release of cell surface proteins by regulated proteolysis, is a general cellular response to injury and is responsible for generating numerous bioactive molecules including growth factors and cytokines. The purpose of our work is to determine whether low doses of low-linear energy transfer (LET) radiation induce shedding of bioactive molecules. Using a mass spectrometry-based global proteomics method, we tested this hypothesis by analyzing for shed proteins in medium from irradiated human mammary epithelial cells (HMEC). Several hundred proteins were identified, including transforming growth factor beta (TGFB); however, no changes in protein abundances attributable to radiation exposure, based onmore » immunoblotting methods, were observed. These results demonstrate that our proteomic-based approach has the sensitivity to identify the kinds of proteins believed to be released after low-dose radiation exposure but that improvements in mass spectrometry-based protein quantification will be required to detect the small changes in abundance associated with this type of insult.« less
DOGMA: domain-based transcriptome and proteome quality assessment.
Dohmen, Elias; Kremer, Lukas P M; Bornberg-Bauer, Erich; Kemena, Carsten
2016-09-01
Genome studies have become cheaper and easier than ever before, due to the decreased costs of high-throughput sequencing and the free availability of analysis software. However, the quality of genome or transcriptome assemblies can vary a lot. Therefore, quality assessment of assemblies and annotations are crucial aspects of genome analysis pipelines. We developed DOGMA, a program for fast and easy quality assessment of transcriptome and proteome data based on conserved protein domains. DOGMA measures the completeness of a given transcriptome or proteome and provides information about domain content for further analysis. DOGMA provides a very fast way to do quality assessment within seconds. DOGMA is implemented in Python and published under GNU GPL v.3 license. The source code is available on https://ebbgit.uni-muenster.de/domainWorld/DOGMA/ CONTACTS: e.dohmen@wwu.de or c.kemena@wwu.de Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Berger, Sebastian T.; Ahmed, Saima; Muntel, Jan; Cuevas Polo, Nerea; Bachur, Richard; Kentsis, Alex; Steen, Judith; Steen, Hanno
2015-01-01
We describe a 96-well plate compatible membrane-based proteomic sample processing method, which enables the complete processing of 96 samples (or multiples thereof) within a single workday. This method uses a large-pore hydrophobic PVDF membrane that efficiently adsorbs proteins, resulting in fast liquid transfer through the membrane and significantly reduced sample processing times. Low liquid transfer speeds have prevented the useful 96-well plate implementation of FASP as a widely used membrane-based proteomic sample processing method. We validated our approach on whole-cell lysate and urine and cerebrospinal fluid as clinically relevant body fluids. Without compromising peptide and protein identification, our method uses a vacuum manifold and circumvents the need for digest desalting, making our processing method compatible with standard liquid handling robots. In summary, our new method maintains the strengths of FASP and simultaneously overcomes one of the major limitations of FASP without compromising protein identification and quantification. PMID:26223766
Broadband ion mobility deconvolution for rapid analysis of complex mixtures.
Pettit, Michael E; Brantley, Matthew R; Donnarumma, Fabrizio; Murray, Kermit K; Solouki, Touradj
2018-05-04
High resolving power ion mobility (IM) allows for accurate characterization of complex mixtures in high-throughput IM mass spectrometry (IM-MS) experiments. We previously demonstrated that pure component IM-MS data can be extracted from IM unresolved post-IM/collision-induced dissociation (CID) MS data using automated ion mobility deconvolution (AIMD) software [Matthew Brantley, Behrooz Zekavat, Brett Harper, Rachel Mason, and Touradj Solouki, J. Am. Soc. Mass Spectrom., 2014, 25, 1810-1819]. In our previous reports, we utilized a quadrupole ion filter for m/z-isolation of IM unresolved monoisotopic species prior to post-IM/CID MS. Here, we utilize a broadband IM-MS deconvolution strategy to remove the m/z-isolation requirement for successful deconvolution of IM unresolved peaks. Broadband data collection has throughput and multiplexing advantages; hence, elimination of the ion isolation step reduces experimental run times and thus expands the applicability of AIMD to high-throughput bottom-up proteomics. We demonstrate broadband IM-MS deconvolution of two separate and unrelated pairs of IM unresolved isomers (viz., a pair of isomeric hexapeptides and a pair of isomeric trisaccharides) in a simulated complex mixture. Moreover, we show that broadband IM-MS deconvolution improves high-throughput bottom-up characterization of a proteolytic digest of rat brain tissue. To our knowledge, this manuscript is the first to report successful deconvolution of pure component IM and MS data from an IM-assisted data-independent analysis (DIA) or HDMSE dataset.
Scheltema, Richard A; Mann, Matthias
2012-06-01
With the advent of high-throughput mass spectrometry (MS)-based proteomics, the magnitude and complexity of the performed experiments has increased dramatically. Likewise, investments in chromatographic and MS instrumentation are a large proportion of the budget of proteomics laboratories. Guarding measurement quality and maximizing uptime of the LC-MS/MS systems therefore requires constant care despite automated workflows. We describe a real-time surveillance system, called SprayQc, that continuously monitors the status of the peripheral equipment to ensure that operational parameters are within an acceptable range. SprayQc is composed of multiple plug-in software components that use computer vision to analyze electrospray conditions, monitor the chromatographic device for stable backpressure, interact with a column oven to control pressure by temperature, and ensure that the mass spectrometer is still acquiring data. Action is taken when a failure condition has been detected, such as stopping the column oven and the LC flow, as well as automatically notifying the appropriate operator. Additionally, all defined metrics can be recorded synchronized on retention time with the MS acquisition file, allowing for later inspection and providing valuable information for optimization. SprayQc has been extensively tested in our laboratory, supports third-party plug-in development, and is freely available for download from http://sourceforge.org/projects/sprayqc .
hEIDI: An Intuitive Application Tool To Organize and Treat Large-Scale Proteomics Data.
Hesse, Anne-Marie; Dupierris, Véronique; Adam, Claire; Court, Magali; Barthe, Damien; Emadali, Anouk; Masselon, Christophe; Ferro, Myriam; Bruley, Christophe
2016-10-07
Advances in high-throughput proteomics have led to a rapid increase in the number, size, and complexity of the associated data sets. Managing and extracting reliable information from such large series of data sets require the use of dedicated software organized in a consistent pipeline to reduce, validate, exploit, and ultimately export data. The compilation of multiple mass-spectrometry-based identification and quantification results obtained in the context of a large-scale project represents a real challenge for developers of bioinformatics solutions. In response to this challenge, we developed a dedicated software suite called hEIDI to manage and combine both identifications and semiquantitative data related to multiple LC-MS/MS analyses. This paper describes how, through a user-friendly interface, hEIDI can be used to compile analyses and retrieve lists of nonredundant protein groups. Moreover, hEIDI allows direct comparison of series of analyses, on the basis of protein groups, while ensuring consistent protein inference and also computing spectral counts. hEIDI ensures that validated results are compliant with MIAPE guidelines as all information related to samples and results is stored in appropriate databases. Thanks to the database structure, validated results generated within hEIDI can be easily exported in the PRIDE XML format for subsequent publication. hEIDI can be downloaded from http://biodev.extra.cea.fr/docs/heidi .
Qiu, Ji; LaBaer, Joshua
2011-01-01
Systematic study of proteins requires the availability of thousands of proteins in functional format. However, traditional recombinant protein expression and purification methods have many drawbacks for such study at the proteome level. We have developed an innovative in situ protein expression and capture system, namely NAPPA (nucleic acid programmable protein array), where C-terminal tagged proteins are expressed using an in vitro expression system and efficiently captured/purified by antitag antibodies coprinted at each spot. The NAPPA technology presented in this chapter enable researchers to produce and display fresh proteins just in time in a multiplexed high-throughput fashion and utilize them for various downstream biochemical researches of interest. This platform could revolutionize the field of functional proteomics with it ability to produce thousands of spatially separated proteins in high density with narrow dynamic rand of protein concentrations, reproducibly and functionally. Copyright © 2011 Elsevier Inc. All rights reserved.
pyQms enables universal and accurate quantification of mass spectrometry data.
Leufken, Johannes; Niehues, Anna; Sarin, L Peter; Wessel, Florian; Hippler, Michael; Leidel, Sebastian A; Fufezan, Christian
2017-10-01
Quantitative mass spectrometry (MS) is a key technique in many research areas (1), including proteomics, metabolomics, glycomics, and lipidomics. Because all of the corresponding molecules can be described by chemical formulas, universal quantification tools are highly desirable. Here, we present pyQms, an open-source software for accurate quantification of all types of molecules measurable by MS. pyQms uses isotope pattern matching that offers an accurate quality assessment of all quantifications and the ability to directly incorporate mass spectrometer accuracy. pyQms is, due to its universal design, applicable to every research field, labeling strategy, and acquisition technique. This opens ultimate flexibility for researchers to design experiments employing innovative and hitherto unexplored labeling strategies. Importantly, pyQms performs very well to accurately quantify partially labeled proteomes in large scale and high throughput, the most challenging task for a quantification algorithm. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Monleón, Daniel; Colson, Kimberly; Moseley, Hunter N B; Anklin, Clemens; Oswald, Robert; Szyperski, Thomas; Montelione, Gaetano T
2002-01-01
Rapid data collection, spectral referencing, processing by time domain deconvolution, peak picking and editing, and assignment of NMR spectra are necessary components of any efficient integrated system for protein NMR structure analysis. We have developed a set of software tools designated AutoProc, AutoPeak, and AutoAssign, which function together with the data processing and peak-picking programs NMRPipe and Sparky, to provide an integrated software system for rapid analysis of protein backbone resonance assignments. In this paper we demonstrate that these tools, together with high-sensitivity triple resonance NMR cryoprobes for data collection and a Linux-based computer cluster architecture, can be combined to provide nearly complete backbone resonance assignments and secondary structures (based on chemical shift data) for a 59-residue protein in less than 30 hours of data collection and processing time. In this optimum case of a small protein providing excellent spectra, extensive backbone resonance assignments could also be obtained using less than 6 hours of data collection and processing time. These results demonstrate the feasibility of high throughput triple resonance NMR for determining resonance assignments and secondary structures of small proteins, and the potential for applying NMR in large scale structural proteomics projects.
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Rea, Giuseppina; Cristofaro, Francesco; Pani, Giuseppe; Pascucci, Barbara; Ghuge, Sandip A; Corsetto, Paola Antonia; Imbriani, Marcello; Visai, Livia; Rizzo, Angela M
2016-03-30
Space is a hostile environment characterized by high vacuum, extreme temperatures, meteoroids, space debris, ionospheric plasma, microgravity and space radiation, which all represent risks for human health. A deep understanding of the biological consequences of exposure to the space environment is required to design efficient countermeasures to minimize their negative impact on human health. Recently, proteomic approaches have received a significant amount of attention in the effort to further study microgravity-induced physiological changes. In this review, we summarize the current knowledge about the effects of microgravity on microorganisms (in particular Cupriavidus metallidurans CH34, Bacillus cereus and Rhodospirillum rubrum S1H), plants (whole plants, organs, and cell cultures), mammalian cells (endothelial cells, bone cells, chondrocytes, muscle cells, thyroid cancer cells, immune system cells) and animals (invertebrates, vertebrates and mammals). Herein, we describe their proteome's response to microgravity, focusing on proteomic discoveries and their future potential applications in space research. Space experiments and operational flight experience have identified detrimental effects on human health and performance because of exposure to weightlessness, even when currently available countermeasures are implemented. Many experimental tools and methods have been developed to study microgravity induced physiological changes. Recently, genomic and proteomic approaches have received a significant amount of attention. This review summarizes the recent research studies of the proteome response to microgravity inmicroorganisms, plants, mammalians cells and animals. Current proteomic tools allow large-scale, high-throughput analyses for the detection, identification, and functional investigation of all proteomes. Understanding gene and/or protein expression is the key to unlocking the mechanisms behind microgravity-induced problems and to finding effective countermeasures to spaceflight-induced alterations but also for the study of diseases on earth. Future perspectives are also highlighted. Copyright © 2015 Elsevier B.V. All rights reserved.
Proteomics Analysis of the Nucleolus in Adenovirus-infected Cells
Lam, Yun W.; Evans, Vanessa C.; Heesom, Kate J.; Lamond, Angus I.; Matthews, David A.
2010-01-01
Adenoviruses replicate primarily in the host cell nucleus, and it is well established that adenovirus infection affects the structure and function of host cell nucleoli in addition to coding for a number of nucleolar targeted viral proteins. Here we used unbiased proteomics methods, including high throughput mass spectrometry coupled with stable isotope labeling by amino acids in cell culture (SILAC) and traditional two-dimensional gel electrophoresis, to identify quantitative changes in the protein composition of the nucleolus during adenovirus infection. Two-dimensional gel analysis revealed changes in six proteins. By contrast, SILAC-based approaches identified 351 proteins with 24 proteins showing at least a 2-fold change after infection. Of those, four were previously reported to have aberrant localization and/or functional relevance during adenovirus infection. In total, 15 proteins identified as changing in amount by proteomics methods were examined in infected cells using confocal microscopy. Eleven of these proteins showed altered patterns of localization in adenovirus-infected cells. Comparing our data with the effects of actinomycin D on the nucleolar proteome revealed that adenovirus infection apparently specifically targets a relatively small subset of nucleolar antigens at the time point examined. PMID:19812395
Proteomics analysis of the nucleolus in adenovirus-infected cells.
Lam, Yun W; Evans, Vanessa C; Heesom, Kate J; Lamond, Angus I; Matthews, David A
2010-01-01
Adenoviruses replicate primarily in the host cell nucleus, and it is well established that adenovirus infection affects the structure and function of host cell nucleoli in addition to coding for a number of nucleolar targeted viral proteins. Here we used unbiased proteomics methods, including high throughput mass spectrometry coupled with stable isotope labeling by amino acids in cell culture (SILAC) and traditional two-dimensional gel electrophoresis, to identify quantitative changes in the protein composition of the nucleolus during adenovirus infection. Two-dimensional gel analysis revealed changes in six proteins. By contrast, SILAC-based approaches identified 351 proteins with 24 proteins showing at least a 2-fold change after infection. Of those, four were previously reported to have aberrant localization and/or functional relevance during adenovirus infection. In total, 15 proteins identified as changing in amount by proteomics methods were examined in infected cells using confocal microscopy. Eleven of these proteins showed altered patterns of localization in adenovirus-infected cells. Comparing our data with the effects of actinomycin D on the nucleolar proteome revealed that adenovirus infection apparently specifically targets a relatively small subset of nucleolar antigens at the time point examined.
Contribution of proteomics to the study of plant pathogenic fungi.
Gonzalez-Fernandez, Raquel; Jorrin-Novo, Jesus V
2012-01-01
Phytopathogenic fungi are one of the most damaging plant parasitic organisms, and can cause serious diseases and important yield losses in crops. The study of the biology of these microorganisms and the interaction with their hosts has experienced great advances in recent years due to the development of moderm, holistic and high-throughput -omic techniques, together with the increasing number of genome sequencing projects and the development of mutants and reverse genetics tools. We highlight among these -omic techniques the importance of proteomics, which has become a relevant tool in plant-fungus pathosystem research. Proteomics intends to identify gene products with a key role in pathogenicity and virulence. These studies would help in the search of key protein targets and in the development of agrochemicals, which may open new ways for crop disease diagnosis and protection. In this review, we made an overview on the contribution of proteomics to the knowledge of life cycle, infection mechanisms, and virulence of the plant pathogenic fungi. Data from current, innovative literature, according to both methodological and experimental systems, were summarized and discussed. Specific sections were devoted to the most studied fungal phytopathogens: Botrytis cinerea, Sclerotinia sclerotiorum, and Fusarium graminearum.
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Gaudreau, Pierre-Olivier; Stagg, John; Soulières, Denis; Saad, Fred
2016-01-01
Prostate cancer (PC) is the second most common form of cancer in men worldwide. Biomarkers have emerged as essential tools for treatment and assessment since the variability of disease behavior, the cost and diversity of treatments, and the related impairment of quality of life have given rise to a need for a personalized approach. High-throughput technology platforms in proteomics and genomics have accelerated the development of biomarkers. Furthermore, recent successes of several new agents in PC, including immunotherapy, have stimulated the search for predictors of response and resistance and have improved the understanding of the biological mechanisms at work. This review provides an overview of currently established biomarkers in PC, as well as a selection of the most promising biomarkers within these particular fields of development. PMID:27168728
The clinical impact of recent advances in LC-MS for cancer biomarker discovery and verification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Hui; Shi, Tujin; Qian, Wei-Jun
2015-12-04
Mass spectrometry-based proteomics has become an indispensable tool in biomedical research with broad applications ranging from fundamental biology, systems biology, and biomarker discovery. Recent advances in LC-MS have made it become a major technology in clinical applications, especially in cancer biomarker discovery and verification. To overcome the challenges associated with the analysis of clinical samples, such as extremely wide dynamic range of protein concentrations in biofluids and the need to perform high throughput and accurate quantification, significant efforts have been devoted to improve the overall performance of LC-MS bases clinical proteomics. In this review, we summarize the recent advances inmore » LC-MS in the aspect of cancer biomarker discovery and quantification, and discuss its potentials, limitations, and future perspectives.« less
The diverse and expanding role of mass spectrometry in structural and molecular biology.
Lössl, Philip; van de Waterbeemd, Michiel; Heck, Albert Jr
2016-12-15
The emergence of proteomics has led to major technological advances in mass spectrometry (MS). These advancements not only benefitted MS-based high-throughput proteomics but also increased the impact of mass spectrometry on the field of structural and molecular biology. Here, we review how state-of-the-art MS methods, including native MS, top-down protein sequencing, cross-linking-MS, and hydrogen-deuterium exchange-MS, nowadays enable the characterization of biomolecular structures, functions, and interactions. In particular, we focus on the role of mass spectrometry in integrated structural and molecular biology investigations of biological macromolecular complexes and cellular machineries, highlighting work on CRISPR-Cas systems and eukaryotic transcription complexes. © 2016 The Authors. Published under the terms of the CC BY NC ND 4.0 license.
The clinical impact of recent advances in LC-MS for cancer biomarker discovery and verification.
Wang, Hui; Shi, Tujin; Qian, Wei-Jun; Liu, Tao; Kagan, Jacob; Srivastava, Sudhir; Smith, Richard D; Rodland, Karin D; Camp, David G
2016-01-01
Mass spectrometry (MS) -based proteomics has become an indispensable tool with broad applications in systems biology and biomedical research. With recent advances in liquid chromatography (LC) and MS instrumentation, LC-MS is making increasingly significant contributions to clinical applications, especially in the area of cancer biomarker discovery and verification. To overcome challenges associated with analyses of clinical samples (for example, a wide dynamic range of protein concentrations in bodily fluids and the need to perform high throughput and accurate quantification of candidate biomarker proteins), significant efforts have been devoted to improve the overall performance of LC-MS-based clinical proteomics platforms. Reviewed here are the recent advances in LC-MS and its applications in cancer biomarker discovery and quantification, along with the potentials, limitations and future perspectives.
Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing
Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko
2015-01-01
The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523
Biomarker Discovery by Novel Sensors Based on Nanoproteomics Approaches
Dasilva, Noelia; Díez, Paula; Matarraz, Sergio; González-González, María; Paradinas, Sara; Orfao, Alberto; Fuentes, Manuel
2012-01-01
During the last years, proteomics has facilitated biomarker discovery by coupling high-throughput techniques with novel nanosensors. In the present review, we focus on the study of label-based and label-free detection systems, as well as nanotechnology approaches, indicating their advantages and applications in biomarker discovery. In addition, several disease biomarkers are shown in order to display the clinical importance of the improvement of sensitivity and selectivity by using nanoproteomics approaches as novel sensors. PMID:22438764
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Ocak, S; Sos, M L; Thomas, R K; Massion, P P
2009-08-01
During the last decade, high-throughput technologies including genomic, epigenomic, transcriptomic and proteomic have been applied to further our understanding of the molecular pathogenesis of this heterogeneous disease, and to develop strategies that aim to improve the management of patients with lung cancer. Ultimately, these approaches should lead to sensitive, specific and noninvasive methods for early diagnosis, and facilitate the prediction of response to therapy and outcome, as well as the identification of potential novel therapeutic targets. Genomic studies were the first to move this field forward by providing novel insights into the molecular biology of lung cancer and by generating candidate biomarkers of disease progression. Lung carcinogenesis is driven by genetic and epigenetic alterations that cause aberrant gene function; however, the challenge remains to pinpoint the key regulatory control mechanisms and to distinguish driver from passenger alterations that may have a small but additive effect on cancer development. Epigenetic regulation by DNA methylation and histone modifications modulate chromatin structure and, in turn, either activate or silence gene expression. Proteomic approaches critically complement these molecular studies, as the phenotype of a cancer cell is determined by proteins and cannot be predicted by genomics or transcriptomics alone. The present article focuses on the technological platforms available and some proposed clinical applications. We illustrate herein how the "-omics" have revolutionised our approach to lung cancer biology and hold promise for personalised management of lung cancer.
NCBI GEO: mining millions of expression profiles--database and tools.
Barrett, Tanya; Suzek, Tugba O; Troup, Dennis B; Wilhite, Stephen E; Ngau, Wing-Chi; Ledoux, Pierre; Rudnev, Dmitry; Lash, Alex E; Fujibuchi, Wataru; Edgar, Ron
2005-01-01
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30,000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
Peptide Identification by Database Search of Mixture Tandem Mass Spectra*
Wang, Jian; Bourne, Philip E.; Bandeira, Nuno
2011-01-01
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision. PMID:21862760
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.
Kaplan, Noam; Vaaknin, Avishay; Linial, Michal
2003-10-01
Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.
2005-01-01
proteomic gel analyses. The research group has explored the use of chemodescriptors calculated using high-level ab initio quantum chemical basis sets...descriptors that characterize the entire proteomics map, local descriptors that characterize a subset of the proteins present in the gel, and spectrum...techniques for analyzing the full set of proteins present in a proteomics map. 14. SUBJECT TERMS 1S. NUMBER OF PAGES Topological indices
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
Accelerating the Design of Solar Thermal Fuel Materials through High Throughput Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y; Grossman, JC
2014-12-01
Solar thermal fuels (STF) store the energy of sunlight, which can then be released later in the form of heat, offering an emission-free and renewable solution for both solar energy conversion and storage. However, this approach is currently limited by the lack of low-cost materials with high energy density and high stability. In this Letter, we present an ab initio high-throughput computational approach to accelerate the design process and allow for searches over a broad class of materials. The high-throughput screening platform we have developed can run through large numbers of molecules composed of earth-abundant elements and identifies possible metastablemore » structures of a given material. Corresponding isomerization enthalpies associated with the metastable structures are then computed. Using this high-throughput simulation approach, we have discovered molecular structures with high isomerization enthalpies that have the potential to be new candidates for high-energy density STF. We have also discovered physical principles to guide further STF materials design through structural analysis. More broadly, our results illustrate the potential of using high-throughput ab initio simulations to design materials that undergo targeted structural transitions.« less
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
High throughput computing: a solution for scientific analysis
O'Donnell, M.
2011-01-01
handle job failures due to hardware, software, or network interruptions (obviating the need to manually resubmit the job after each stoppage); be affordable; and most importantly, allow us to complete very large, complex analyses that otherwise would not even be possible. In short, we envisioned a job-management system that would take advantage of unused FORT CPUs within a local area network (LAN) to effectively distribute and run highly complex analytical processes. What we found was a solution that uses High Throughput Computing (HTC) and High Performance Computing (HPC) systems to do exactly that (Figure 1).
Identifying the missing proteins in human proteome by biological language model.
Dong, Qiwen; Wang, Kai; Liu, Xuan
2016-12-23
With the rapid development of high-throughput sequencing technology, the proteomics research becomes a trendy field in the post genomics era. It is necessary to identify all the native-encoding protein sequences for further function and pathway analysis. Toward that end, the Human Proteome Organization lunched the Human Protein Project in 2011. However many proteins are hard to be detected by experiment methods, which becomes one of the bottleneck in Human Proteome Project. In consideration of the complicatedness of detecting these missing proteins by using wet-experiment approach, here we use bioinformatics method to pre-filter the missing proteins. Since there are analogy between the biological sequences and natural language, the n-gram models from Natural Language Processing field has been used to filter the missing proteins. The dataset used in this study contains 616 missing proteins from the "uncertain" category of the neXtProt database. There are 102 proteins deduced by the n-gram model, which have high probability to be native human proteins. We perform a detail analysis on the predicted structure and function of these missing proteins and also compare the high probability proteins with other mass spectrum datasets. The evaluation shows that the results reported here are in good agreement with those obtained by other well-established databases. The analysis shows that 102 proteins may be native gene-coding proteins and some of the missing proteins are membrane or natively disordered proteins which are hard to be detected by experiment methods.
Urine Sample Preparation in 96-Well Filter Plates for Quantitative Clinical Proteomics
2015-01-01
Urine is an important, noninvasively collected body fluid source for the diagnosis and prognosis of human diseases. Liquid chromatography mass spectrometry (LC-MS) based shotgun proteomics has evolved as a sensitive and informative technique to discover candidate disease biomarkers from urine specimens. Filter-aided sample preparation (FASP) generates peptide samples from protein mixtures of cell lysate or body fluid origin. Here, we describe a FASP method adapted to 96-well filter plates, named 96FASP. Soluble urine concentrates containing ∼10 μg of total protein were processed by 96FASP and LC-MS resulting in 700–900 protein identifications at a 1% false discovery rate (FDR). The experimental repeatability, as assessed by label-free quantification and Pearson correlation analysis for shared proteins among replicates, was high (R ≥ 0.97). Application to urinary pellet lysates which is of particular interest in the context of urinary tract infection analysis was also demonstrated. On average, 1700 proteins (±398) were identified in five experiments. In a pilot study using 96FASP for analysis of eight soluble urine samples, we demonstrated that protein profiles of technical replicates invariably clustered; the protein profiles for distinct urine donors were very different from each other. Robust, highly parallel methods to generate peptide mixtures from urine and other body fluids are critical to increase cost-effectiveness in clinical proteomics projects. This 96FASP method has potential to become a gold standard for high-throughput quantitative clinical proteomics. PMID:24797144
Tiberti, Natalia; Sanchez, Jean-Charles
2015-09-01
The quantitative proteomics data here reported are part of a research article entitled "Increased acute immune response during the meningo-encephalitic stage of Trypanosoma brucei rhodesiense sleeping sickness compared to Trypanosoma brucei gambiense", published by Tiberti et al., 2015. Transl. Proteomics 6, 1-9. Sleeping sickness (human African trypanosomiasis - HAT) is a deadly neglected tropical disease affecting mainly rural communities in sub-Saharan Africa. This parasitic disease is caused by the Trypanosoma brucei (T. b.) parasite, which is transmitted to the human host through the bite of the tse-tse fly. Two parasite sub-species, T. b. rhodesiense and T. b. gambiense, are responsible for two clinically different and geographically separated forms of sleeping sickness. The objective of the present study was to characterise and compare the cerebrospinal fluid (CSF) proteome of stage 2 (meningo-encephalitic stage) HAT patients suffering from T. b. gambiense or T. b. rhodesiense disease using high-throughput quantitative proteomics and the Tandem Mass Tag (TMT(®)) isobaric labelling. In order to evaluate the CSF proteome in the context of HAT pathophysiology, the protein dataset was then submitted to gene ontology and pathway analysis. Two significantly differentially expressed proteins (C-reactive protein and orosomucoid 1) were further verified on a larger population of patients (n=185) by ELISA, confirming the mass spectrometry results. By showing a predominant involvement of the acute immune response in rhodesiense HAT, the proteomics results obtained in this work will contribute to further understand the mechanisms of pathology occurring in HAT and to propose new biomarkers of potential clinical utility. The mass spectrometry raw data are available in the Pride Archive via ProteomeXchange through the identifier PXD001082.
High-coverage quantitative proteomics using amine-specific isotopic labeling.
Melanson, Jeremy E; Avery, Steven L; Pinto, Devanand M
2006-08-01
Peptide dimethylation with isotopically coded formaldehydes was evaluated as a potential alternative to techniques such as the iTRAQ method for comparative proteomics. The isotopic labeling strategy and custom-designed protein quantitation software were tested using protein standards and then applied to measure proteins levels associated with Alzheimer's disease (AD). The method provided high accuracy (10% error), precision (14% RSD) and coverage (70%) when applied to the analysis of a standard solution of BSA by LC-MS/MS. The technique was then applied to measure protein abundance levels in brain tissue afflicted with AD relative to normal brain tissue. 2-D LC-MS analysis identified 548 unique proteins (p<0.05). Of these, 349 were quantified with two or more peptides that met the statistical criteria used in this study. Several classes of proteins exhibited significant changes in abundance. For example, elevated levels of antioxidant proteins and decreased levels of mitochondrial electron transport proteins were observed. The results demonstrate the utility of the labeling method for high-throughput quantitative analysis.
Stepping into the omics era: Opportunities and challenges for biomaterials science and engineering.
Groen, Nathalie; Guvendiren, Murat; Rabitz, Herschel; Welsh, William J; Kohn, Joachim; de Boer, Jan
2016-04-01
The research paradigm in biomaterials science and engineering is evolving from using low-throughput and iterative experimental designs towards high-throughput experimental designs for materials optimization and the evaluation of materials properties. Computational science plays an important role in this transition. With the emergence of the omics approach in the biomaterials field, referred to as materiomics, high-throughput approaches hold the promise of tackling the complexity of materials and understanding correlations between material properties and their effects on complex biological systems. The intrinsic complexity of biological systems is an important factor that is often oversimplified when characterizing biological responses to materials and establishing property-activity relationships. Indeed, in vitro tests designed to predict in vivo performance of a given biomaterial are largely lacking as we are not able to capture the biological complexity of whole tissues in an in vitro model. In this opinion paper, we explain how we reached our opinion that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. In this opinion paper, we postulate that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. Copyright © 2016. Published by Elsevier Ltd.
2010-01-01
Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong
2010-01-18
The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
Durighello, Emie; Christie-Oleza, Joseph Alexander; Armengaud, Jean
2014-01-01
Bacteria from the Roseobacter clade are abundant in surface marine ecosystems as over 10% of bacterial cells in the open ocean and 20% in coastal waters belong to this group. In order to document how these marine bacteria interact with their environment, we analyzed the exoproteome of Phaeobacter strain DSM 17395. We grew the strain in marine medium, collected the exoproteome and catalogued its content with high-throughput nanoLC-MS/MS shotgun proteomics. The major component represented 60% of the total protein content but was refractory to either classical proteomic identification or proteogenomics. We de novo sequenced this abundant protein with high-resolution tandem mass spectra which turned out being the 53 kDa RTX-toxin ZP_02147451. It comprised a peptidase M10 serralysin domain. We explained its recalcitrance to trypsin proteolysis and proteomic identification by its unusual low number of basic residues. We found this is a conserved trait in RTX-toxins from Roseobacter strains which probably explains their persistence in the harsh conditions around bacteria. Comprehensive analysis of exoproteomes from environmental bacteria should take into account this proteolytic recalcitrance. PMID:24586966
Durighello, Emie; Christie-Oleza, Joseph Alexander; Armengaud, Jean
2014-01-01
Bacteria from the Roseobacter clade are abundant in surface marine ecosystems as over 10% of bacterial cells in the open ocean and 20% in coastal waters belong to this group. In order to document how these marine bacteria interact with their environment, we analyzed the exoproteome of Phaeobacter strain DSM 17395. We grew the strain in marine medium, collected the exoproteome and catalogued its content with high-throughput nanoLC-MS/MS shotgun proteomics. The major component represented 60% of the total protein content but was refractory to either classical proteomic identification or proteogenomics. We de novo sequenced this abundant protein with high-resolution tandem mass spectra which turned out being the 53 kDa RTX-toxin ZP_02147451. It comprised a peptidase M10 serralysin domain. We explained its recalcitrance to trypsin proteolysis and proteomic identification by its unusual low number of basic residues. We found this is a conserved trait in RTX-toxins from Roseobacter strains which probably explains their persistence in the harsh conditions around bacteria. Comprehensive analysis of exoproteomes from environmental bacteria should take into account this proteolytic recalcitrance.
Zhou, Li; Wen, Ji; Huang, Zhao; Nice, Edouard C; Huang, Canhua; Zhang, Haiyuan; Li, Qifu
2017-03-01
Liver cancer is a major global health problem being the sixth most common cancer and the third cause of cancer-related death, with hepatocellular carcinoma (HCC) representing more than 90% of primary liver cancers. Mounting evidence suggests that, compared with their normal counterparts, many types of cancer cell have increased levels of ROS. Therefore, cancer cells need to combat high levels of ROS, especially at early stages of tumor development. Recent studies have revealed that ROS-mediated regulation of redox-sensitive proteins (redox sensors) is involved in the pathogenesis and/or progression of many human diseases, including cancer. Unraveling the altered functions of redox sensors and the underlying mechanisms in hepatocarcinogenesis is critical for the development of novel cancer therapeutics. For this reason, redox proteomics has been developed for the high-throughput screening of redox sensors, which will benefit the development of novel therapeutic strategies for the treatment of HCC. In this review, we will briefly introduce several novel redox proteomics techniques that are currently available to study various oxidative modifications in hepatocarcinogenesis and summarize the most important discoveries in the study of redox processes related to the development and progression of HCC. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Baldrian, Petr; López-Mondéjar, Rubén
2014-02-01
Molecular methods for the analysis of biomolecules have undergone rapid technological development in the last decade. The advent of next-generation sequencing methods and improvements in instrumental resolution enabled the analysis of complex transcriptome, proteome and metabolome data, as well as a detailed annotation of microbial genomes. The mechanisms of decomposition by model fungi have been described in unprecedented detail by the combination of genome sequencing, transcriptomics and proteomics. The increasing number of available genomes for fungi and bacteria shows that the genetic potential for decomposition of organic matter is widespread among taxonomically diverse microbial taxa, while expression studies document the importance of the regulation of expression in decomposition efficiency. Importantly, high-throughput methods of nucleic acid analysis used for the analysis of metagenomes and metatranscriptomes indicate the high diversity of decomposer communities in natural habitats and their taxonomic composition. Today, the metaproteomics of natural habitats is of interest. In combination with advanced analytical techniques to explore the products of decomposition and the accumulation of information on the genomes of environmentally relevant microorganisms, advanced methods in microbial ecophysiology should increase our understanding of the complex processes of organic matter transformation.
Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics.
Keich, Uri; Kertesz-Farkas, Attila; Noble, William Stafford
2015-08-07
Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.
Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics
2016-01-01
Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications. PMID:26152888
A Comprehensive, Open-source Platform for Mass Spectrometry-based Glycoproteomics Data Analysis.
Liu, Gang; Cheng, Kai; Lo, Chi Y; Li, Jun; Qu, Jun; Neelamegham, Sriram
2017-11-01
Glycosylation is among the most abundant and diverse protein post-translational modifications (PTMs) identified to date. The structural analysis of this PTM is challenging because of the diverse monosaccharides which are not conserved among organisms, the branched nature of glycans, their isomeric structures, and heterogeneity in the glycan distribution at a given site. Glycoproteomics experiments have adopted the traditional high-throughput LC-MS n proteomics workflow to analyze site-specific glycosylation. However, comprehensive computational platforms for data analyses are scarce. To address this limitation, we present a comprehensive, open-source, modular software for glycoproteomics data analysis called GlycoPAT (GlycoProteomics Analysis Toolbox; freely available from www.VirtualGlycome.org/glycopat). The program includes three major advances: (1) "SmallGlyPep," a minimal linear representation of glycopeptides for MS n data analysis. This format allows facile serial fragmentation of both the peptide backbone and PTM at one or more locations. (2) A novel scoring scheme based on calculation of the "Ensemble Score (ES)," a measure that scores and rank-orders MS/MS spectrum for N- and O-linked glycopeptides using cross-correlation and probability based analyses. (3) A false discovery rate (FDR) calculation scheme where decoy glycopeptides are created by simultaneously scrambling the amino acid sequence and by introducing artificial monosaccharides by perturbing the original sugar mass. Parallel computing facilities and user-friendly GUIs (Graphical User Interfaces) are also provided. GlycoPAT is used to catalogue site-specific glycosylation on simple glycoproteins, standard protein mixtures and human plasma cryoprecipitate samples in three common MS/MS fragmentation modes: CID, HCD and ETD. It is also used to identify 960 unique glycopeptides in cell lysates from prostate cancer cells. The results show that the simultaneous consideration of peptide and glycan fragmentation is necessary for high quality MS n spectrum annotation in CID and HCD fragmentation modes. Additionally, they confirm the suitability of GlycoPAT to analyze shotgun glycoproteomics data. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Comparative and Quantitative Global Proteomics Approaches: An Overview
Deracinois, Barbara; Flahaut, Christophe; Duban-Deweer, Sophie; Karamanos, Yannis
2013-01-01
Proteomics became a key tool for the study of biological systems. The comparison between two different physiological states allows unravelling the cellular and molecular mechanisms involved in a biological process. Proteomics can confirm the presence of proteins suggested by their mRNA content and provides a direct measure of the quantity present in a cell. Global and targeted proteomics strategies can be applied. Targeted proteomics strategies limit the number of features that will be monitored and then optimise the methods to obtain the highest sensitivity and throughput for a huge amount of samples. The advantage of global proteomics strategies is that no hypothesis is required, other than a measurable difference in one or more protein species between the samples. Global proteomics methods attempt to separate quantify and identify all the proteins from a given sample. This review highlights only the different techniques of separation and quantification of proteins and peptides, in view of a comparative and quantitative global proteomics analysis. The in-gel and off-gel quantification of proteins will be discussed as well as the corresponding mass spectrometry technology. The overview is focused on the widespread techniques while keeping in mind that each approach is modular and often recovers the other. PMID:28250403
Bilić, Petra; Guillemin, Nicolas; Kovačević, Alan; Beer Ljubić, Blanka; Jović, Ines; Galan, Asier; Eckersall, Peter David; Burchmore, Richard; Mrljak, Vladimir
2018-05-15
Idiopathic dilated cardiomyopathy (iDCM) is a primary myocardial disorder with an unknown aetiology, characterized by reduced contractility and ventricular dilation of the left or both ventricles. Naturally occurring canine iDCM was used herein to identify serum proteomic signature of the disease compared to the healthy state, providing an insight into underlying mechanisms and revealing proteins with biomarker potential. To achieve this, we used high-throughput label-based quantitative LC-MS/MS proteomics approach and bioinformatics analysis of the in silico inferred interactome protein network created from the initial list of differential proteins. To complement the proteomic analysis, serum biochemical parameters and levels of know biomarkers of cardiac function were measured. Several proteins with biomarker potential were identified, such as inter-alpha-trypsin inhibitor heavy chain H4, microfibril-associated glycoprotein 4 and apolipoprotein A-IV, which were validated using an independent method (Western blotting) and showed high specificity and sensitivity according to the receiver operating characteristic curve analysis. Bioinformatics analysis revealed involvement of different pathways in iDCM, such as complement cascade activation, lipoprotein particles dynamics, elastic fibre formation, GPCR signalling and respiratory electron transport chain. Idiopathic dilated cardiomyopathy is a severe primary myocardial disease of unknown cause, affecting both humans and dogs. This study is a contribution to the canine heart disease research by means of proteomic and bioinformatic state of the art analyses, following similar approach in human iDCM research. Importantly, we used serum as non-invasive and easily accessible biological source of information and contributed to the scarce data on biofluid proteome research on this topic. Bioinformatics analysis revealed biological pathways modulated in canine iDCM with potential of further targeted research. Also, several proteins with biomarker potential have been identified and successfully validated. Copyright © 2018 Elsevier B.V. All rights reserved.
High Throughput Genotoxicity Profiling of the US EPA ToxCast Chemical Library
A key aim of the ToxCast project is to investigate modern molecular and genetic high content and high throughput screening (HTS) assays, along with various computational tools to supplement and perhaps replace traditional assays for evaluating chemical toxicity. Genotoxicity is a...
Current algorithmic solutions for peptide-based proteomics data generation and identification.
Hoopmann, Michael R; Moritz, Robert L
2013-02-01
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics. Copyright © 2012 Elsevier Ltd. All rights reserved.
Wang, Wenhua; Simon, Martin; Wu, Feihua; Hu, Wenjun; Chen, Juan B.; Zheng, Hailei
2014-01-01
With rapid economic development, most regions in southern China have suffered acid rain (AR) pollution. In our study, we analyzed the changes in sulfur metabolism in Arabidopsis under simulated AR stress which provide one of the first case studies, in which the systematic responses in sulfur metabolism were characterized by high-throughput methods at different levels including proteomic, genomic and physiological approaches. Generally, we found that all of the processes related to sulfur metabolism responded to AR stress, including sulfur uptake, activation and also synthesis of sulfur-containing amino acid and other secondary metabolites. Finally, we provided a catalogue of the detected sulfur metabolic changes and reconstructed the coordinating network of their mutual influences. This study can help us to understand the mechanisms of plants to adapt to AR stress. PMID:24595051
BIG: a large-scale data integration tool for renal physiology.
Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya; Knepper, Mark A
2016-10-01
Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: "How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?" This is the type of problem that has motivated the "Big-Data" revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/.
Mathematical and Computational Modeling in Complex Biological Systems
Li, Wenyang; Zhu, Xiaoliang
2017-01-01
The biological process and molecular functions involved in the cancer progression remain difficult to understand for biologists and clinical doctors. Recent developments in high-throughput technologies urge the systems biology to achieve more precise models for complex diseases. Computational and mathematical models are gradually being used to help us understand the omics data produced by high-throughput experimental techniques. The use of computational models in systems biology allows us to explore the pathogenesis of complex diseases, improve our understanding of the latent molecular mechanisms, and promote treatment strategy optimization and new drug discovery. Currently, it is urgent to bridge the gap between the developments of high-throughput technologies and systemic modeling of the biological process in cancer research. In this review, we firstly studied several typical mathematical modeling approaches of biological systems in different scales and deeply analyzed their characteristics, advantages, applications, and limitations. Next, three potential research directions in systems modeling were summarized. To conclude, this review provides an update of important solutions using computational modeling approaches in systems biology. PMID:28386558
A high-throughput screening approach for the optoelectronic properties of conjugated polymers.
Wilbraham, Liam; Berardo, Enrico; Turcani, Lukas; Jelfs, Kim E; Zwijnenburg, Martijn A
2018-06-25
We propose a general high-throughput virtual screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (TD-)DFT data computed for a representative diverse set of (co-)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.
Mathematical and Computational Modeling in Complex Biological Systems.
Ji, Zhiwei; Yan, Ke; Li, Wenyang; Hu, Haigen; Zhu, Xiaoliang
2017-01-01
The biological process and molecular functions involved in the cancer progression remain difficult to understand for biologists and clinical doctors. Recent developments in high-throughput technologies urge the systems biology to achieve more precise models for complex diseases. Computational and mathematical models are gradually being used to help us understand the omics data produced by high-throughput experimental techniques. The use of computational models in systems biology allows us to explore the pathogenesis of complex diseases, improve our understanding of the latent molecular mechanisms, and promote treatment strategy optimization and new drug discovery. Currently, it is urgent to bridge the gap between the developments of high-throughput technologies and systemic modeling of the biological process in cancer research. In this review, we firstly studied several typical mathematical modeling approaches of biological systems in different scales and deeply analyzed their characteristics, advantages, applications, and limitations. Next, three potential research directions in systems modeling were summarized. To conclude, this review provides an update of important solutions using computational modeling approaches in systems biology.
Assembling proteomics data as a prerequisite for the analysis of large scale experiments
Schmidt, Frank; Schmid, Monika; Thiede, Bernd; Pleißner, Klaus-Peter; Böhme, Martina; Jungblut, Peter R
2009-01-01
Background Despite the complete determination of the genome sequence of a huge number of bacteria, their proteomes remain relatively poorly defined. Beside new methods to increase the number of identified proteins new database applications are necessary to store and present results of large- scale proteomics experiments. Results In the present study, a database concept has been developed to address these issues and to offer complete information via a web interface. In our concept, the Oracle based data repository system SQL-LIMS plays the central role in the proteomics workflow and was applied to the proteomes of Mycobacterium tuberculosis, Helicobacter pylori, Salmonella typhimurium and protein complexes such as 20S proteasome. Technical operations of our proteomics labs were used as the standard for SQL-LIMS template creation. By means of a Java based data parser, post-processed data of different approaches, such as LC/ESI-MS, MALDI-MS and 2-D gel electrophoresis (2-DE), were stored in SQL-LIMS. A minimum set of the proteomics data were transferred in our public 2D-PAGE database using a Java based interface (Data Transfer Tool) with the requirements of the PEDRo standardization. Furthermore, the stored proteomics data were extractable out of SQL-LIMS via XML. Conclusion The Oracle based data repository system SQL-LIMS played the central role in the proteomics workflow concept. Technical operations of our proteomics labs were used as standards for SQL-LIMS templates. Using a Java based parser, post-processed data of different approaches such as LC/ESI-MS, MALDI-MS and 1-DE and 2-DE were stored in SQL-LIMS. Thus, unique data formats of different instruments were unified and stored in SQL-LIMS tables. Moreover, a unique submission identifier allowed fast access to all experimental data. This was the main advantage compared to multi software solutions, especially if personnel fluctuations are high. Moreover, large scale and high-throughput experiments must be managed in a comprehensive repository system such as SQL-LIMS, to query results in a systematic manner. On the other hand, these database systems are expensive and require at least one full time administrator and specialized lab manager. Moreover, the high technical dynamics in proteomics may cause problems to adjust new data formats. To summarize, SQL-LIMS met the requirements of proteomics data handling especially in skilled processes such as gel-electrophoresis or mass spectrometry and fulfilled the PSI standardization criteria. The data transfer into a public domain via DTT facilitated validation of proteomics data. Additionally, evaluation of mass spectra by post-processing using MS-Screener improved the reliability of mass analysis and prevented storage of data junk. PMID:19166578
Xiao, Kunhong; Sun, Jinpeng
2018-01-01
The discovery of β-arrestin-dependent GPCR signaling has led to an exciting new field in GPCR pharmacology: to develop "biased agonists" that can selectively target a specific downstream signaling pathway that elicits beneficial therapeutic effects without activating other pathways that elicit negative side effects. This new trend in GPCR drug discovery requires us to understand the structural and molecular mechanisms of β-arrestin-biased agonism, which largely remain unclear. We have used cutting-edge mass spectrometry (MS)-based proteomics, combined with systems, chemical and structural biology to study protein function, macromolecular interaction, protein expression and posttranslational modifications in the β-arrestin-dependent GPCR signaling. These high-throughput proteomic studies have provided a systems view of β-arrestin-biased agonism from several perspectives: distinct receptor phosphorylation barcode, multiple receptor conformations, distinct β-arrestin conformations, and ligand-specific signaling. The information obtained from these studies offers new insights into the molecular basis of GPCR regulation by β-arrestin and provides a potential platform for developing novel therapeutic interventions through GPCRs. Copyright © 2017 Elsevier Inc. All rights reserved.
Proteomics-based compositional analysis of complex cellulase-hemicellulase mixtures.
Chundawat, Shishir P S; Lipton, Mary S; Purvine, Samuel O; Uppugundla, Nirmal; Gao, Dahai; Balan, Venkatesh; Dale, Bruce E
2011-10-07
Efficient deconstruction of cellulosic biomass to fermentable sugars for fuel and chemical production is accomplished by a complex mixture of cellulases, hemicellulases, and accessory enzymes (e.g., >50 extracellular proteins). Cellulolytic enzyme mixtures, produced industrially mostly using fungi like Trichoderma reesei, are poorly characterized in terms of their protein composition and its correlation to hydrolytic activity on cellulosic biomass. The secretomes of commercial glycosyl hydrolase-producing microbes was explored using a proteomics approach with high-throughput quantification using liquid chromatography-tandem mass spectrometry (LC-MS/MS). Here, we show that proteomics-based spectral counting approach is a reasonably accurate and rapid analytical technique that can be used to determine protein composition of complex glycosyl hydrolase mixtures that also correlates with the specific activity of individual enzymes present within the mixture. For example, a strong linear correlation was seen between Avicelase activity and total cellobiohydrolase content. Reliable, quantitative and cheaper analytical methods that provide insight into the cellulosic biomass degrading fungal and bacterial secretomes would lead to further improvements toward commercialization of plant biomass-derived fuels and chemicals.
A Researcher's Guide to Mass Spectrometry-Based Proteomics
Savaryn, John P.; Toby, Timothy K.; Kelleher, Neil L.
2016-01-01
Mass spectrometry (MS) is widely recognized as a powerful analytical tool for molecular research. MS is used by researchers around the globe to identify, quantify, and characterize biomolecules like proteins from any number of biological conditions or sample types. As instrumentation has advanced, and with the coupling of liquid chromatography (LC) for high-throughput LC-MS/MS, a proteomics experiment measuring hundreds to thousands of proteins/protein groups is now commonplace. While expert practitioners who best understand the operation of LC-MS systems tend to have strong backgrounds in physics and engineering, consumers of proteomics data and technology are not exposed to the physio-chemical principles underlying the information they seek. Since articles and reviews tend not to focus on bridging this divide, our goal here is to span this gap and translate MS ion physics into language intuitive to the general reader active in basic or applied biomedical research. Here, we visually describe what happens to ions as they enter and move around inside a mass spectrometer. We describe basic MS principles, including electric current, ion optics, ion traps, quadrupole mass filters, and Orbitrap FT-analyzers. PMID:27553853
Hamzeiy, Hamid; Cox, Jürgen
2017-02-01
Computational workflows for mass spectrometry-based shotgun proteomics and untargeted metabolomics share many steps. Despite the similarities, untargeted metabolomics is lagging behind in terms of reliable fully automated quantitative data analysis. We argue that metabolomics will strongly benefit from the adaptation of successful automated proteomics workflows to metabolomics. MaxQuant is a popular platform for proteomics data analysis and is widely considered to be superior in achieving high precursor mass accuracies through advanced nonlinear recalibration, usually leading to five to ten-fold better accuracy in complex LC-MS/MS runs. This translates to a sharp decrease in the number of peptide candidates per measured feature, thereby strongly improving the coverage of identified peptides. We argue that similar strategies can be applied to untargeted metabolomics, leading to equivalent improvements in metabolite identification. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
High-throughput screening, predictive modeling and computational embryology - Abstract
High-throughput screening (HTS) studies are providing a rich source of data that can be applied to chemical profiling to address sensitivity and specificity of molecular targets, biological pathways, cellular and developmental processes. EPA’s ToxCast project is testing 960 uniq...
Bahrami-Samani, Emad; Vo, Dat T.; de Araujo, Patricia Rosa; Vogel, Christine; Smith, Andrew D.; Penalva, Luiz O. F.; Uren, Philip J.
2014-01-01
Co- and post-transcriptional regulation of gene expression is complex and multi-faceted, spanning the complete RNA lifecycle from genesis to decay. High-throughput profiling of the constituent events and processes is achieved through a range of technologies that continue to expand and evolve. Fully leveraging the resulting data is non-trivial, and requires the use of computational methods and tools carefully crafted for specific data sources and often intended to probe particular biological processes. Drawing upon databases of information pre-compiled by other researchers can further elevate analyses. Within this review, we describe the major co- and post-transcriptional events in the RNA lifecycle that are amenable to high-throughput profiling. We place specific emphasis on the analysis of the resulting data, in particular the computational tools and resources available, as well as looking towards future challenges that remain to be addressed. PMID:25515586
Wang, Chen; Zhou, Jiangrui; Wang, Shuowen; Ye, Mingliang; Jiang, Chunlei; Fan, Guorong; Zou, Hanfa
2010-06-04
This study investigated the mechanisms involved in the antinociceptive action induced by levo-tetrahydropalmatine (l-THP) in the formalin test by combined comparative and chemical proteomics. Rats were pretreated with l-THP by the oral route (40 mg/kg) 1 h before formalin injection. The antinociceptive effect of l-THP was shown in the first and second phases of the formalin test. To address the mechanisms by which l-THP inhibits formalin-induced nociception in rats, the combined comparative and chemical proteomics were applied. A novel high-throughput comparative proteomic approach based on 2D-nano-LC-MS/MS was applied to simultaneously evaluate the deregulated proteins involved in the response of l-THP treatment in formalin-induced pain rats. Thousands of proteins were identified, among which 17 proteins survived the stringent filter criteria and were further included for functional discussion. Two proteins (Neurabin-1 and Calcium-dependent secretion activator 1) were randomly selected, and their expression levels were further confirmed by Western Blots. The results matched well with those of proteomics. In the present study, we also described the development and application of l-THP immobilized beads to bind the targets. Following incubation with cellular lysates, the proteome interacting with the fixed l-THP was identified. The results of comparative and chemical proteomics were quite complementary. Although the precise roles of these identified moleculars in l-THP-induced antinociception need further study, the combined results indicated that proteins associated with signal transduction, vesicular trafficking and neurotransmitter release, energy metabolism, and ion transport play important roles in l-THP-induced antinociception in the formalin test.
Kim, Young-Ha; slam, Mohammad Saiful; You, Myung-Jo
2015-01-01
Proteomic tools allow large-scale, high-throughput analyses for the detection, identification, and functional investigation of proteome. For detection of antigens from Haemaphysalis longicornis, 1-dimensional electrophoresis (1-DE) quantitative immunoblotting technique combined with 2-dimensional electrophoresis (2-DE) immunoblotting was used for whole body proteins from unfed and partially fed female ticks. Reactivity bands and 2-DE immunoblotting were performed following 2-DE electrophoresis to identify protein spots. The proteome of the partially fed female had a larger number of lower molecular weight proteins than that of the unfed female tick. The total number of detected spots was 818 for unfed and 670 for partially fed female ticks. The 2-DE immunoblotting identified 10 antigenic spots from unfed females and 8 antigenic spots from partially fed females. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF) of relevant spots identified calreticulin, putative secreted WC salivary protein, and a conserved hypothetical protein from the National Center for Biotechnology Information and Swiss Prot protein sequence databases. These findings indicate that most of the whole body components of these ticks are non-immunogenic. The data reported here will provide guidance in the identification of antigenic proteins to prevent infestation and diseases transmitted by H. longicornis. PMID:25748713
Proteomic Cinderella: Customized analysis of bulky MS/MS data in one night.
Kiseleva, Olga; Poverennaya, Ekaterina; Shargunov, Alexander; Lisitsa, Andrey
2018-02-01
Proteomic challenges, stirred up by the advent of high-throughput technologies, produce large amount of MS data. Nowadays, the routine manual search does not satisfy the "speed" of modern science any longer. In our work, the necessity of single-thread analysis of bulky data emerged during interpretation of HepG2 proteome profiling results for proteoforms searching. We compared the contribution of each of the eight search engines (X!Tandem, MS-GF[Formula: see text], MS Amanda, MyriMatch, Comet, Tide, Andromeda, and OMSSA) integrated in an open-source graphical user interface SearchGUI ( http://searchgui.googlecode.com ) into total result of proteoforms identification and optimized set of engines working simultaneously. We also compared the results of our search combination with Mascot results using protein kit UPS2, containing 48 human proteins. We selected combination of X!Tandem, MS-GF[Formula: see text] and OMMSA as the most time-efficient and productive combination of search. We added homemade java-script to automatize pipeline from file picking to report generation. These settings resulted in rise of the efficiency of our customized pipeline unobtainable by manual scouting: the analysis of 192 files searched against human proteome (42153 entries) downloaded from UniProt took 11[Formula: see text]h.
Fagerquist, Clifton K
2017-01-01
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) is increasingly utilized as a rapid technique to identify microorganisms including pathogenic bacteria. However, little attention has been paid to the significant proteomic information encoded in the MS peaks that collectively constitute the MS 'fingerprint'. This review/perspective is intended to explore this topic in greater detail in the hopes that it may spur interest and further research in this area. Areas covered: This paper examines the recent literature on utilizing MALDI-TOF for bacterial identification. Critical works highlighting protein biomarker identification of bacteria, arguments for and against protein biomarker identification, proteomic approaches to biomarker identification, emergence of MALDI-TOF-TOF platforms and their use for top-down proteomic identification of bacterial proteins, protein denaturation and its effect on protein ion fragmentation, collision cross-sections and energy deposition during desorption/ionization are also explored. Expert commentary: MALDI-TOF and TOF-TOF mass spectrometry platforms will continue to provide chemical analyses that are rapid, cost-effective and high throughput. These instruments have proven their utility in the taxonomic identification of pathogenic bacteria at the genus and species level and are poised to more fully characterize these microorganisms to the benefit of clinical microbiology, food safety and other fields.
Fungal proteomics: from identification to function.
Doyle, Sean
2011-08-01
Some fungi cause disease in humans and plants, while others have demonstrable potential for the control of insect pests. In addition, fungi are also a rich reservoir of therapeutic metabolites and industrially useful enzymes. Detailed analysis of fungal biochemistry is now enabled by multiple technologies including protein mass spectrometry, genome and transcriptome sequencing and advances in bioinformatics. Yet, the assignment of function to fungal proteins, encoded either by in silico annotated, or unannotated genes, remains problematic. The purpose of this review is to describe the strategies used by many researchers to reveal protein function in fungi, and more importantly, to consolidate the nomenclature of 'unknown function protein' as opposed to 'hypothetical protein' - once any protein has been identified by protein mass spectrometry. A combination of approaches including comparative proteomics, pathogen-induced protein expression and immunoproteomics are outlined, which, when used in combination with a variety of other techniques (e.g. functional genomics, microarray analysis, immunochemical and infection model systems), appear to yield comprehensive and definitive information on protein function in fungi. The relative advantages of proteomic, as opposed to transcriptomic-only, analyses are also described. In the future, combined high-throughput, quantitative proteomics, allied to transcriptomic sequencing, are set to reveal much about protein function in fungi. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Quantitative proteomic analysis of microdissected oral epithelium for cancer biomarker discovery.
Xiao, Hua; Langerman, Alexander; Zhang, Yan; Khalid, Omar; Hu, Shen; Cao, Cheng-Xi; Lingen, Mark W; Wong, David T W
2015-11-01
Specific biomarkers are urgently needed for the detection and progression of oral cancer. The objective of this study was to discover cancer biomarkers from oral epithelium through utilizing high throughput quantitative proteomics approaches. Morphologically malignant, epithelial dysplasia, and adjacent normal epithelial tissues were laser capture microdissected (LCM) from 19 patients and used for proteomics analysis. Total proteins from each group were extracted, digested and then labelled with corresponding isobaric tags for relative and absolute quantitation (iTRAQ). Labelled peptides from each sample were combined and analyzed by liquid chromatography-mass spectrometry (LC-MS/MS) for protein identification and quantification. In total, 500 proteins were identified and 425 of them were quantified. When compared with adjacent normal oral epithelium, 17 and 15 proteins were consistently up-regulated or down-regulated in malignant and epithelial dysplasia, respectively. Half of these candidate biomarkers were discovered for oral cancer for the first time. Cornulin was initially confirmed in tissue protein extracts and was further validated in tissue microarray. Its presence in the saliva of oral cancer patients was also explored. Myoglobin and S100A8 were pre-validated by tissue microarray. These data demonstrated that the proteomic biomarkers discovered through this strategy are potential targets for oral cancer detection and salivary diagnostics. Copyright © 2015 Elsevier Ltd. All rights reserved.
Architecture Mapping of the Inner Mitochondrial Membrane Proteome by Chemical Tools in Live Cells.
Lee, Song-Yi; Kang, Myeong-Gyun; Shin, Sanghee; Kwak, Chulhwan; Kwon, Taejoon; Seo, Jeong Kon; Kim, Jong-Seo; Rhee, Hyun-Woo
2017-03-15
The inner mitochondrial membrane (IMM) proteome plays a central role in maintaining mitochondrial physiology and cellular metabolism. Various important biochemical reactions such as oxidative phosphorylation, metabolite production, and mitochondrial biogenesis are conducted by the IMM proteome, and mitochondria-targeted therapeutics have been developed for IMM proteins, which is deeply related for various human metabolic diseases including cancer and neurodegenerative diseases. However, the membrane topology of the IMM proteome remains largely unclear because of the lack of methods to evaluate it in live cells in a high-throughput manner. In this article, we reveal the in vivo topological direction of 135 IMM proteins, using an in situ-generated radical probe with genetically targeted peroxidase (APEX). Owing to the short lifetime of phenoxyl radicals generated in situ by submitochondrial targeted APEX and the impermeability of the IMM to small molecules, the solvent-exposed tyrosine residues of both the matrix and intermembrane space (IMS) sides of IMM proteins were exclusively labeled with the radical probe in live cells by Matrix-APEX and IMS-APEX, respectively and identified by mass spectrometry. From this analysis, we confirmed 58 IMM protein topologies and we could determine the topological direction of 77 IMM proteins whose topology at the IMM has not been fully characterized. We also found several IMM proteins (e.g., LETM1 and OXA1) whose topological information should be revised on the basis of our results. Overall, our identification of structural information on the mitochondrial inner-membrane proteome can provide valuable insights for the architecture and connectome of the IMM proteome in live cells.
Brooks, Brandon; Mueller, R. S.; Young, Jacque C.; ...
2015-07-01
While there has been growing interest in the gut microbiome in recent years, it remains unclear whether closely related species and strains have similar or distinct functional roles and if organisms capable of both aerobic and anaerobic growth do so simultaneously. To investigate these questions, we implemented a high-throughput mass spectrometry-based proteomics approach to identify proteins in fecal samples collected on days of life 13 21 from an infant born at 28 weeks gestation. No prior studies have coupled strain-resolved community metagenomics to proteomics for such a purpose. Sequences were manually curated to resolve the genomes of two strains ofmore » Citrobacter that were present during the later stage of colonization. Proteome extracts from fecal samples were processed via a nano-2D-LC-MS/MS and peptides were identified based on information predicted from the genome sequences for the dominant organisms, Serratia and the two Citrobacter strains. These organisms are facultative anaerobes, and proteomic information indicates the utilization of both aerobic and anaerobic metabolisms throughout the time series. This may indicate growth in distinct niches within the gastrointestinal tract. We uncovered differences in the physiology of coexisting Citrobacter strains, including differences in motility and chemotaxis functions. Additionally, for both Citrobacter strains we resolved a community-essential role in vitamin metabolism and a predominant role in propionate production. Finally, in this case study we detected differences between genome abundance and activity levels for the dominant populations. This underlines the value in layering proteomic information over genetic potential.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Wu, Si; Meng, Da
Characterization of the mature protein complement in cells is crucial for a better understanding of cellular processes on a systems-wide scale. Bottom-up proteomic approaches often lead to loss of critical information about an endogenous protein’s actual state due to post translational modifications (PTMs) and other processes. Top-down approaches that involve analysis of the intact protein can address this concern but present significant analytical challenges related to the separation quality needed, measurement sensitivity, and speed that result in low throughput and limited coverage. Here we used single-dimension ultra high pressure liquid chromatography mass spectrometry to investigate the comprehensive ‘intact’ proteome ofmore » the Gram negative bacterial pathogen Salmonella Typhimurium. Top-down proteomics analysis revealed 563 unique proteins including 1665 proteoforms generated by PTMs, representing the largest microbial top-down dataset reported to date. Our analysis not only confirmed several previously recognized aspects of Salmonella biology and bacterial PTMs in general, but also revealed several novel biological insights. Of particular interest was differential utilization of the protein S-thiolation forms S-glutathionylation and S-cysteinylation in response to infection-like conditions versus basal conditions, which was corroborated by changes in corresponding biosynthetic pathways. This differential utilization highlights underlying metabolic mechanisms that modulate changes in cellular signaling, and represents to our knowledge the first report of S-cysteinylation in Gram negative bacteria. The demonstrated utility of our simple proteome-wide intact protein level measurement strategy for gaining biological insight should promote broader adoption and applications of top-down proteomics approaches.« less
WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M
2006-01-01
Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281
High-Throughput Quantitative Proteomic Analysis of Dengue Virus Type 2 Infected A549 Cells
Chiu, Han-Chen; Hannemann, Holger; Heesom, Kate J.; Matthews, David A.; Davidson, Andrew D.
2014-01-01
Disease caused by dengue virus is a global health concern with up to 390 million individuals infected annually worldwide. There are no vaccines or antiviral compounds available to either prevent or treat dengue disease which may be fatal. To increase our understanding of the interaction of dengue virus with the host cell, we analyzed changes in the proteome of human A549 cells in response to dengue virus type 2 infection using stable isotope labelling in cell culture (SILAC) in combination with high-throughput mass spectrometry (MS). Mock and infected A549 cells were fractionated into nuclear and cytoplasmic extracts before analysis to identify proteins that redistribute between cellular compartments during infection and reduce the complexity of the analysis. We identified and quantified 3098 and 2115 proteins in the cytoplasmic and nuclear fractions respectively. Proteins that showed a significant alteration in amount during infection were examined using gene enrichment, pathway and network analysis tools. The analyses revealed that dengue virus infection modulated the amounts of proteins involved in the interferon and unfolded protein responses, lipid metabolism and the cell cycle. The SILAC-MS results were validated for a select number of proteins over a time course of infection by Western blotting and immunofluorescence microscopy. Our study demonstrates for the first time the power of SILAC-MS for identifying and quantifying novel changes in cellular protein amounts in response to dengue virus infection. PMID:24671231
AOPs and Biomarkers: Bridging High Throughput Screening and Regulatory Decision Making
As high throughput screening (HTS) plays a larger role in toxicity testing, camputational toxicology has emerged as a critical component in interpreting the large volume of data produced. Computational models designed to quantify potential adverse effects based on HTS data will b...
High-throughput screening, predictive modeling and computational embryology
High-throughput screening (HTS) studies are providing a rich source of data that can be applied to profile thousands of chemical compounds for biological activity and potential toxicity. EPA’s ToxCast™ project, and the broader Tox21 consortium, in addition to projects worldwide,...
An LC-IMS-MS Platform Providing Increased Dynamic Range for High-Throughput Proteomic Studies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Erin Shammel; Livesay, Eric A.; Orton, Daniel J.
2010-02-05
A high-throughput approach and platform using 15 minute reversed-phase capillary liquid chromatography (RPLC) separations in conjunction with ion mobility spectrometry-mass spectrometry (IMS-MS) measurements was evaluated for the rapid analysis of complex proteomics samples. To test the separation quality of the short LC gradient, a sample was prepared by spiking twenty reference peptides at varying concentrations from 1 ng/mL to 10 µg/mL into a tryptic digest of mouse blood plasma and analyzed with both a LC-Linear Ion Trap Fourier Transform (FT) MS and LC-IMS-TOF MS. The LC-FT MS detected thirteen out of the twenty spiked peptides that had concentrations ≥100 ng/mL.more » In contrast, the drift time selected mass spectra from the LC-IMS-TOF MS analyses yielded identifications for nineteen of the twenty peptides with all spiking level present. The greater dynamic range of the LC-IMS-TOF MS system could be attributed to two factors. First, the LC-IMS-TOF MS system enabled drift time separation of the low concentration spiked peptides from the high concentration mouse peptide matrix components, reducing signal interference and background, and allowing species to be resolved that would otherwise be obscured by other components. Second, the automatic gain control (AGC) in the linear ion trap of the hybrid FT MS instrument limits the number of ions that are accumulated to reduce space charge effects, but in turn limits the achievable dynamic range compared to the TOF detector.« less
Emerging techniques for the discovery and validation of therapeutic targets for skeletal diseases.
Cho, Christine H; Nuttall, Mark E
2002-12-01
Advances in genomics and proteomics have revolutionised the drug discovery process and target validation. Identification of novel therapeutic targets for chronic skeletal diseases is an extremely challenging process based on the difficulty of obtaining high-quality human diseased versus normal tissue samples. The quality of tissue and genomic information obtained from the sample is critical to identifying disease-related genes. Using a genomics-based approach, novel genes or genes with similar homology to existing genes can be identified from cDNA libraries generated from normal versus diseased tissue. High-quality cDNA libraries are prepared from uncontaminated homogeneous cell populations harvested from tissue sections of interest. Localised gene expression analysis and confirmation are obtained through in situ hybridisation or immunohistochemical studies. Cells overexpressing the recombinant protein are subsequently designed for primary cell-based high-throughput assays that are capable of screening large compound banks for potential hits. Afterwards, secondary functional assays are used to test promising compounds. The same overexpressing cells are used in the secondary assay to test protein activity and functionality as well as screen for small-molecule agonists or antagonists. Once a hit is generated, a structure-activity relationship of the compound is optimised for better oral bioavailability and pharmacokinetics allowing the compound to progress into development. Parallel efforts from proteomics, as well as genetics/transgenics, bioinformatics and combinatorial chemistry, and improvements in high-throughput automation technologies, allow the drug discovery process to meet the demands of the medicinal market. This review discusses and illustrates how different approaches are incorporated into the discovery and validation of novel targets and, consequently, the development of potentially therapeutic agents in the areas of osteoporosis and osteoarthritis. While current treatments exist in the form of hormone replacement therapy, antiresorptive and anabolic agents for osteoporosis, there are no disease-modifying therapies for the treatment of the most common human joint disease, osteoarthritis. A massive market potential for improved options with better safety and efficacy still remains. Therefore, the application of genomics and proteomics for both diseases should provide much needed novel therapeutic approaches to treating these major world health problems.
The Scottish Structural Proteomics Facility: targets, methods and outputs
Oke, Muse; Carter, Lester G.; Johnson, Kenneth A.; Liu, Huanting; McMahon, Stephen A.; Yan, Xuan; Kerou, Melina; Weikart, Nadine D.; Kadi, Nadia; Sheikh, Md. Arif; Schmelz, Stefan; Dorward, Mark; Zawadzki, Michal; Cozens, Christopher; Falconer, Helen; Powers, Helen; Overton, Ian M.; van Niekerk, C. A. Johannes; Peng, Xu; Patel, Prakash; Garrett, Roger A.; Prangishvili, David; Botting, Catherine H.; Coote, Peter J.; Dryden, David T. F.; Barton, Geoffrey J.; Schwarz-Linek, Ulrich; Challis, Gregory L.; Taylor, Garry L.; White, Malcolm F.
2010-01-01
The Scottish Structural Proteomics Facility was funded to develop a laboratory scale approach to high throughput structure determination. The effort was successful in that over 40 structures were determined. These structures and the methods harnessed to obtain them are reported here. This report reflects on the value of automation but also on the continued requirement for a high degree of scientific and technical expertise. The efficiency of the process poses challenges to the current paradigm of structural analysis and publication. In the 5 year period we published ten peer-reviewed papers reporting structural data arising from the pipeline. Nevertheless, the number of structures solved exceeded our ability to analyse and publish each new finding. By reporting the experimental details and depositing the structures we hope to maximize the impact of the project by allowing others to follow up the relevant biology. Electronic supplementary material The online version of this article (doi:10.1007/s10969-010-9090-y) contains supplementary material, which is available to authorized users. PMID:20419351
Lung cancer screening beyond low-dose computed tomography: the role of novel biomarkers.
Hasan, Naveed; Kumar, Rohit; Kavuru, Mani S
2014-10-01
Lung cancer is the most common and lethal malignancy in the world. The landmark National lung screening trial (NLST) showed a 20% relative reduction in mortality in high-risk individuals with screening low-dose computed tomography. However, the poor specificity and low prevalence of lung cancer in the NLST provide major limitations to its widespread use. Furthermore, a lung nodule on CT scan requires a nuanced and individualized approach towards management. In this regard, advances in high through-put technology (molecular diagnostics, multi-gene chips, proteomics, and bronchoscopic techniques) have led to discovery of lung cancer biomarkers that have shown potential to complement the current screening standards. Early detection of lung cancer can be achieved by analysis of biomarkers from tissue samples within the respiratory tract such as sputum, saliva, nasal/bronchial airway epithelial cells and exhaled breath condensate or through peripheral biofluids such as blood, serum and urine. Autofluorescence bronchoscopy has been employed in research setting to identify pre-invasive lesions not identified on CT scan. Although these modalities are not yet commercially available in clinic setting, they will be available in the near future and clinicians who care for patients with lung cancer should be aware. In this review, we present up-to-date state of biomarker development, discuss their clinical relevance and predict their future role in lung cancer management.
Thompson, John W; Sorum, Alexander W; Hsieh-Wilson, Linda C
2018-06-23
The dynamic posttranslational modification O-linked β-N-acetylglucosamine glycosylation (O-GlcNAcylation) is present on thousands of intracellular proteins in the brain. Like phosphorylation, O-GlcNAcylation is inducible and plays important functional roles in both physiology and disease. Recent advances in mass spectrometry (MS) and bioconjugation methods are now enabling the mapping of O-GlcNAcylation events to individual sites in proteins. However, our understanding of which glycosylation events are necessary for regulating protein function and controlling specific processes, phenotypes, or diseases remains in its infancy. Given the sheer number of O-GlcNAc sites, methods are greatly needed to identify promising sites and prioritize them for time- and resource-intensive functional studies. Revealing sites that are dynamically altered by different stimuli or disease states will likely to go a long way in this regard. Here, we describe advanced methods for identifying O-GlcNAc sites on individual proteins and across the proteome, and for determining their stoichiometry in vivo. We also highlight emerging technologies for quantitative, site-specific MS-based O-GlcNAc proteomics (O-GlcNAcomics), which allow proteome-wide tracking of O-GlcNAcylation dynamics at individual sites. These cutting-edge technologies are beginning to bridge the gap between the high-throughput cataloging of O-GlcNAcylated proteins and the relatively low-throughput study of individual proteins. By uncovering the O-GlcNAcylation events that change in specific physiological and disease contexts, these new approaches are providing key insights into the regulatory functions of O-GlcNAc in the brain, including their roles in neuroprotection, neuronal signaling, learning and memory, and neurodegenerative diseases.
Armstrong, Stuart D; Xia, Dong; Bah, Germanus S; Krishna, Ritesh; Ngangyung, Henrietta F; LaCourse, E James; McSorley, Henry J; Kengne-Ouafo, Jonas A; Chounna-Ndongmo, Patrick W; Wanji, Samuel; Enyong, Peter A; Taylor, David W; Blaxter, Mark L; Wastling, Jonathan M; Tanya, Vincent N; Makepeace, Benjamin L
2016-08-01
Despite 40 years of control efforts, onchocerciasis (river blindness) remains one of the most important neglected tropical diseases, with 17 million people affected. The etiological agent, Onchocerca volvulus, is a filarial nematode with a complex lifecycle involving several distinct stages in the definitive host and blackfly vector. The challenges of obtaining sufficient material have prevented high-throughput studies and the development of novel strategies for disease control and diagnosis. Here, we utilize the closest relative of O. volvulus, the bovine parasite Onchocerca ochengi, to compare stage-specific proteomes and host-parasite interactions within the secretome. We identified a total of 4260 unique O. ochengi proteins from adult males and females, infective larvae, intrauterine microfilariae, and fluid from intradermal nodules. In addition, 135 proteins were detected from the obligate Wolbachia symbiont. Observed protein families that were enriched in all whole body extracts relative to the complete search database included immunoglobulin-domain proteins, whereas redox and detoxification enzymes and proteins involved in intracellular transport displayed stage-specific overrepresentation. Unexpectedly, the larval stages exhibited enrichment for several mitochondrial-related protein families, including members of peptidase family M16 and proteins which mediate mitochondrial fission and fusion. Quantification of proteins across the lifecycle using the Hi-3 approach supported these qualitative analyses. In nodule fluid, we identified 94 O. ochengi secreted proteins, including homologs of transforming growth factor-β and a second member of a novel 6-ShK toxin domain family, which was originally described from a model filarial nematode (Litomosoides sigmodontis). Strikingly, the 498 bovine proteins identified in nodule fluid were strongly dominated by antimicrobial proteins, especially cathelicidins. This first high-throughput analysis of an Onchocerca spp. proteome across the lifecycle highlights its profound complexity and emphasizes the extremely close relationship between O. ochengi and O. volvulus The insights presented here provide new candidates for vaccine development, drug targeting and diagnostic biomarkers. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Armstrong, Stuart D.; Xia, Dong; Bah, Germanus S.; Krishna, Ritesh; Ngangyung, Henrietta F.; LaCourse, E. James; McSorley, Henry J.; Kengne-Ouafo, Jonas A.; Chounna-Ndongmo, Patrick W.; Wanji, Samuel; Enyong, Peter A.; Taylor, David W.; Blaxter, Mark L.; Wastling, Jonathan M.; Tanya, Vincent N.; Makepeace, Benjamin L.
2016-01-01
Despite 40 years of control efforts, onchocerciasis (river blindness) remains one of the most important neglected tropical diseases, with 17 million people affected. The etiological agent, Onchocerca volvulus, is a filarial nematode with a complex lifecycle involving several distinct stages in the definitive host and blackfly vector. The challenges of obtaining sufficient material have prevented high-throughput studies and the development of novel strategies for disease control and diagnosis. Here, we utilize the closest relative of O. volvulus, the bovine parasite Onchocerca ochengi, to compare stage-specific proteomes and host-parasite interactions within the secretome. We identified a total of 4260 unique O. ochengi proteins from adult males and females, infective larvae, intrauterine microfilariae, and fluid from intradermal nodules. In addition, 135 proteins were detected from the obligate Wolbachia symbiont. Observed protein families that were enriched in all whole body extracts relative to the complete search database included immunoglobulin-domain proteins, whereas redox and detoxification enzymes and proteins involved in intracellular transport displayed stage-specific overrepresentation. Unexpectedly, the larval stages exhibited enrichment for several mitochondrial-related protein families, including members of peptidase family M16 and proteins which mediate mitochondrial fission and fusion. Quantification of proteins across the lifecycle using the Hi-3 approach supported these qualitative analyses. In nodule fluid, we identified 94 O. ochengi secreted proteins, including homologs of transforming growth factor-β and a second member of a novel 6-ShK toxin domain family, which was originally described from a model filarial nematode (Litomosoides sigmodontis). Strikingly, the 498 bovine proteins identified in nodule fluid were strongly dominated by antimicrobial proteins, especially cathelicidins. This first high-throughput analysis of an Onchocerca spp. proteome across the lifecycle highlights its profound complexity and emphasizes the extremely close relationship between O. ochengi and O. volvulus. The insights presented here provide new candidates for vaccine development, drug targeting and diagnostic biomarkers. PMID:27226403
Ultra-Structure database design methodology for managing systems biology data and analyses
Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M; Giddings, Morgan C
2009-01-01
Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era. PMID:19691849
Lee, Si Hoon; Lindquist, Nathan C.; Wittenberg, Nathan J.; Jordan, Luke R.; Oh, Sang-Hyun
2012-01-01
With recent advances in high-throughput proteomics and systems biology, there is a growing demand for new instruments that can precisely quantify a wide range of receptor-ligand binding kinetics in a high-throughput fashion. Here we demonstrate a surface plasmon resonance (SPR) imaging spectroscopy instrument capable of extracting binding kinetics and affinities from 50 parallel microfluidic channels simultaneously. The instrument utilizes large-area (~cm2) metallic nanohole arrays as SPR sensing substrates and combines a broadband light source, a high-resolution imaging spectrometer and a low-noise CCD camera to extract spectral information from every channel in real time with a refractive index resolution of 7.7 × 10−6. To demonstrate the utility of our instrument for quantifying a wide range of biomolecular interactions, each parallel microfluidic channel is coated with a biomimetic supported lipid membrane containing ganglioside (GM1) receptors. The binding kinetics of cholera toxin b (CTX-b) to GM1 are then measured in a single experiment from 50 channels. By combining the highly parallel microfluidic device with large-area periodic nanohole array chips, our SPR imaging spectrometer system enables high-throughput, label-free, real-time SPR biosensing, and its full-spectral imaging capability combined with nanohole arrays could enable integration of SPR imaging with concurrent surface-enhanced Raman spectroscopy. PMID:22895607
DOE Office of Scientific and Technical Information (OSTI.GOV)
Orton, Daniel J.; Tfaily, Malak M.; Moore, Ronald J.
To better understand disease conditions and environmental perturbations, multi-omic studies (i.e. proteomic, lipidomic, metabolomic, etc. analyses) are vastly increasing in popularity. In a multi-omic study, a single sample is typically extracted in multiple ways and numerous analyses are performed using different instruments. Thus, one sample becomes many analyses, making high throughput and reproducible evaluations a necessity. One way to address the numerous samples and varying instrumental conditions is to utilize a flow injection analysis (FIA) system for rapid sample injection. While some FIA systems have been created to address these challenges, many have limitations such as high consumable costs, lowmore » pressure capabilities, limited pressure monitoring and fixed flow rates. To address these limitations, we created an automated, customizable FIA system capable of operating at diverse flow rates (~50 nL/min to 500 µL/min) to accommodate low- and high-flow instrument sources. This system can also operate at varying analytical throughputs from 24 to 1200 samples per day to enable different MS analysis approaches. Applications ranging from native protein analyses to molecular library construction were performed using the FIA system. The results from these studies showed a highly robust platform, providing consistent performance over many days without carryover as long as washing buffers specific to each molecular analysis were utilized.« less
Use of High-Throughput Testing and Approaches for Evaluating Chemical Risk-Relevance to Humans
ToxCast is profiling the bioactivity of thousands of chemicals based on high-throughput screening (HTS) and computational models that integrate knowledge of biological systems and in vivo toxicities. Many of these assays probe signaling pathways and cellular processes critical to...
Identifier mapping performance for integrating transcriptomics and proteomics experimental results
2011-01-01
Background Studies integrating transcriptomic data with proteomic data can illuminate the proteome more clearly than either separately. Integromic studies can deepen understanding of the dynamic complex regulatory relationship between the transcriptome and the proteome. Integrating these data dictates a reliable mapping between the identifier nomenclature resultant from the two high-throughput platforms. However, this kind of analysis is well known to be hampered by lack of standardization of identifier nomenclature among proteins, genes, and microarray probe sets. Therefore data integration may also play a role in critiquing the fallible gene identifications that both platforms emit. Results We compared three freely available internet-based identifier mapping resources for mapping UniProt accessions (ACCs) to Affymetrix probesets identifications (IDs): DAVID, EnVision, and NetAffx. Liquid chromatography-tandem mass spectrometry analyses of 91 endometrial cancer and 7 noncancer samples generated 11,879 distinct ACCs. For each ACC, we compared the retrieval sets of probeset IDs from each mapping resource. We confirmed a high level of discrepancy among the mapping resources. On the same samples, mRNA expression was available. Therefore, to evaluate the quality of each ACC-to-probeset match, we calculated proteome-transcriptome correlations, and compared the resources presuming that better mapping of identifiers should generate a higher proportion of mapped pairs with strong inter-platform correlations. A mixture model for the correlations fitted well and supported regression analysis, providing a window into the performance of the mapping resources. The resources have added and dropped matches over two years, but their overall performance has not changed. Conclusions The methods presented here serve to achieve concrete context-specific insight, to support well-informed decisions in choosing an ID mapping strategy for "omic" data merging. PMID:21619611
DOE Office of Scientific and Technical Information (OSTI.GOV)
Denef, Vincent; Shah, Manesh B; Verberkmoes, Nathan C
The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effectsmore » of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less
A draft map of the human ovarian proteome for tissue engineering and clinical applications.
Ouni, Emna; Vertommen, Didier; Chiti, Maria Costanza; Dolmans, Marie-Madeleine; Amorim, Christiani Andrade
2018-02-23
Fertility preservation research in women today is increasingly taking advantage of bioengineering techniques to develop new biomimetic materials and solutions to safeguard ovarian cell function and microenvironment in vitro and in vivo. However, available data on the human ovary are limited and fundamental differences between animal models and humans are hampering researchers in their quest for more extensive knowledge of human ovarian physiology and key reproductive proteins that need to be preserved. We therefore turned to multi-dimensional label-free mass spectrometry to analyze human ovarian cortex, as it is a high-throughput and conclusive technique providing information on the proteomic composition of complex tissues like the ovary. In-depth proteomic profiling through two-dimensional liquid chromatography-mass spectrometry, western blot, histological and immunohistochemical analyses, and data mining helped us to confidently identify 1,508 proteins. Moreover, our method allowed us to chart the most complete representation so far of the ovarian matrisome, defined as the ensemble of extracellular matrix proteins and associated factors, including more than 80 proteins. In conclusion, this study will provide a better understanding of ovarian proteomics, with a detailed characterization of the ovarian follicle microenvironment, in order to enable bioengineers to create biomimetic scaffolds for transplantation and three-dimensional in vitro culture. By publishing our proteomic data, we also hope to contribute to accelerating biomedical research into ovarian health and disease in general. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Lohnes, Karen; Quebbemann, Neil R; Liu, Kate; Kobzeff, Fred; Loo, Joseph A; Ogorzalek Loo, Rachel R
2016-07-15
The virtual two-dimensional gel electrophoresis/mass spectrometry (virtual 2D gel/MS) technology combines the premier, high-resolution capabilities of 2D gel electrophoresis with the sensitivity and high mass accuracy of mass spectrometry (MS). Intact proteins separated by isoelectric focusing (IEF) gel electrophoresis are imaged from immobilized pH gradient (IPG) polyacrylamide gels (the first dimension of classic 2D-PAGE) by matrix-assisted laser desorption/ionization (MALDI) MS. Obtaining accurate intact masses from sub-picomole-level proteins embedded in 2D-PAGE gels or in IPG strips is desirable to elucidate how the protein of one spot identified as protein 'A' on a 2D gel differs from the protein of another spot identified as the same protein, whenever tryptic peptide maps fail to resolve the issue. This task, however, has been extremely challenging. Virtual 2D gel/MS provides access to these intact masses. Modifications to our matrix deposition procedure improve the reliability with which IPG gels can be prepared; the new procedure is described. Development of this MALDI MS imaging (MSI) method for high-throughput MS with integrated 'top-down' MS to elucidate protein isoforms from complex biological samples is described and it is demonstrated that a 4-cm IPG gel segment can now be imaged in approximately 5min. Gel-wide chemical and enzymatic methods with further interrogation by MALDI MS/MS provide identifications, sequence-related information, and post-translational/transcriptional modification information. The MSI-based virtual 2D gel/MS platform may potentially link the benefits of 'top-down' and 'bottom-up' proteomics. Copyright © 2016 Elsevier Inc. All rights reserved.
BIG: a large-scale data integration tool for renal physiology
Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya
2016-01-01
Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: “How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?” This is the type of problem that has motivated the “Big-Data” revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/. PMID:27279488
Systematic cloning of an ORFeome using the Gateway system.
Matsuyama, Akihisa; Yoshida, Minoru
2009-01-01
With the completion of the genome projects, there are increasing demands on the experimental systems that enable to exploit the entire set of protein-coding open reading frames (ORFs), viz. ORFeome, en masse. Systematic proteomic studies based on cloned ORFeomes are called "reverse proteomics," and have been launched in many organisms in recent years. Cloning of an ORFeome is such an attractive way for comprehensive understanding of biological phenomena, but is a challenging and daunting task. However, recent advances in techniques for DNA cloning using site-specific recombination and for high-throughput experimental techniques have made it feasible to clone an ORFeome with the minimum of exertion. The Gateway system is one of such the approaches, employing the recombination reaction of the bacteriophage lambda. Combining traditional DNA manipulation methods with modern technique of the recombination-based cloning system, it is possible to clone an ORFeome of an organism on an individual level.
Scharf, Michael; Sethi, Amit
2016-09-13
Termites have specialized digestive systems that overcome the lignin barrier in wood to release fermentable simple sugars. Using the termite Reticulitermes flavipes and its gut symbionts, high-throughput titanium pyrosequencing and proteomics approaches experimentally compared the effects of lignin-containing diets on host-symbiont digestome composition. Proteomic investigations and functional digestive studies with recombinant lignocellulases conducted in parallel provided strong evidence of congruence at the transcription and translational levels and provide enzymatic strategies for overcoming recalcitrant lignin barriers in biofuel feedstocks. Briefly described, therefore, the disclosure provides a system for generating a fermentable product from a lignified plant material, the system comprising a cooperating series of at least two catalytically active polypeptides, where said catalytically active polypeptides are selected from the group consisting of: cellulase Cell-1, .beta.-glu cellulase, an aldo-keto-reductase, a catalase, a laccase, and an endo-xylanase.
Choksawangkarn, Waeowalee; Kim, Sung-Kyoung; Cannon, Joe R.; Edwards, Nathan J.; Lee, Sang Bok; Fenselau, Catherine
2013-01-01
Proteomic and other characterization of plasma membrane proteins is made difficult by their low abundance, hydrophobicity, frequent carboxylation and dynamic population. We and others have proposed that underrepresentation in LC-MS/MS analysis can be partially compensated by enriching the plasma membrane and its proteins using cationic nanoparticle pellicles. The nanoparticles increase the density of plasma membrane sheets and thus enhance separation by centrifugation from other lysed cellular components. Herein we test the hypothesis that the use of nanoparticles with increased densities can provide enhanced enrichment of plasma membrane proteins for proteomic analysis. Multiple myeloma cells were grown and coated in suspension with three different pellicles of three different densities and both pellicle coated and uncoated suspensions analyzed by high-throughput LC-MS/MS. Enrichment was evaluated by the total number and the spectral counts of identified plasma membrane proteins. PMID:23289353
Acero, Francisco Javier Fernández; Carbú, María; El-Akhal, Mohamed Rabie; Garrido, Carlos; González-Rodríguez, Victoria E.; Cantoral, Jesús M.
2011-01-01
Proteomics has become one of the most relevant high-throughput technologies. Several approaches have been used for studying, for example, tumor development, biomarker discovery, or microbiology. In this “post-genomic” era, the relevance of these studies has been highlighted as the phenotypes determined by the proteins and not by the genotypes encoding them that is responsible for the final phenotypes. One of the most interesting outcomes of these technologies is the design of new drugs, due to the discovery of new disease factors that may be candidates for new therapeutic targets. To our knowledge, no commercial fungicides have been developed from targeted molecular research, this review will shed some light on future prospects. We will summarize previous research efforts and discuss future innovations, focused on the fight against one of the main agents causing a devastating crops disease, fungal phytopathogens. PMID:21340014
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madar, Inamul Hasan; Ko, Seung-Ik; Kim, Hokeun
Mass spectrometry (MS)-based proteomics, which uses high-resolution hybrid mass spectrometers such as the quadrupole-orbitrap mass spectrometer, can yield tens of thousands of tandem mass (MS/MS) spectra of high resolution during a routine bottom-up experiment. Despite being a fundamental and key step in MS-based proteomics, the accurate determination and assignment of precursor monoisotopic masses to the MS/MS spectra remains difficult. The difficulties stem from imperfect isotopic envelopes of precursor ions, inaccurate charge states for precursor ions, and cofragmentation. We describe a composite method of utilizing MS data to assign accurate monoisotopic masses to MS/MS spectra, including those subject to cofragmentation. Themore » method, “multiplexed post-experiment monoisotopic mass refinement” (mPE-MMR), consists of the following: multiplexing of precursor masses to assign multiple monoisotopic masses of cofragmented peptides to the corresponding multiplexed MS/MS spectra, multiplexing of charge states to assign correct charges to the precursor ions of MS/ MS spectra with no charge information, and mass correction for inaccurate monoisotopic peak picking. When combined with MS-GF+, a database search algorithm based on fragment mass difference, mPE-MMR effectively increases both sensitivity and accuracy in peptide identification from complex high-throughput proteomics data compared to conventional methods.« less
SeqAPASS to evaluate conservation of high-throughput screening targets across non-mammalian species
Cell-based high-throughput screening (HTS) and computational technologies are being applied as tools for toxicity testing in the 21st century. The U.S. Environmental Protection Agency (EPA) embraced these technologies and created the ToxCast Program in 2007, which has served as a...
We demonstrate a computational network model that integrates 18 in vitro, high-throughput screening assays measuring estrogen receptor (ER) binding, dimerization, chromatin binding, transcriptional activation and ER-dependent cell proliferation. The network model uses activity pa...
Vellaichamy, Adaikkalam; Tran, John C.; Catherman, Adam D.; Lee, Ji Eun; Kellie, John F.; Sweet, Steve M.M.; Zamdborg, Leonid; Thomas, Paul M.; Ahlf, Dorothy R.; Durbin, Kenneth R.; Valaskovic, Gary A.; Kelleher, Neil L.
2010-01-01
Despite the availability of ultra-high resolution mass spectrometers, methods for separation and detection of intact proteins for proteome-scale analyses are still in a developmental phase. Here we report robust protocols for on-line LC-MS to drive high-throughput top-down proteomics in a fashion similar to bottom-up. Comparative work on protein standards showed that a polymeric stationary phase led to superior sensitivity over a silica-based medium in reversed-phase nanocapillary-LC, with detection of proteins >50 kDa routinely accomplished in the linear ion trap of a hybrid Fourier-Transform mass spectrometer. Protein identification was enabled by nozzle-skimmer dissociation (NSD) and detection of fragment ions with <5 ppm mass accuracy for highly-specific database searching using custom software. This overall approach led to identification of proteins up to 80 kDa, with 10-60 proteins identified in single LC-MS runs of samples from yeast and human cell lines pre-fractionated by their molecular weight using a gel-based sieving system. PMID:20073486
Nuriel, Tal; Deeb, Ruba S.; Hajjar, David P.; Gross, Steven S.
2008-01-01
Nitration of tyrosine residues by nitric oxide (NO)-derived species results in the accumulation of 3-nitrotyrosine in proteins, a hallmark of nitrosative stress in cells and tissues. Tyrosine nitration is recognized as one of the multiple signaling modalities used by NO-derived species for the regulation of protein structure and function in health and disease. Various methods have been described for the quantification of protein 3-nitrotyrosine residues, and several strategies have been presented toward the goal of proteome-wide identification of protein tyrosine modification sites. This chapter details a useful protocol for the quantification of 3-nitrotyrosine in cells and tissues using high-pressure liquid chromatography with electrochemical detection. Additionally, this chapter describes a novel biotin-tagging strategy for specific enrichment of 3-nitrotyrosine-containing peptides. Application of this strategy, in conjunction with high-throughput MS/MS-based peptide sequencing, is anticipated to fuel efforts in developing comprehensive inventories of nitrosative stress-induced protein-tyrosine modification sites in cells and tissues. PMID:18554526
Year 2 Report: Protein Function Prediction Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C E
2012-04-27
Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fullymore » automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.« less
The impact of network medicine in gastroenterology and hepatology.
Baffy, György
2013-10-01
In the footsteps of groundbreaking achievements made by biomedical research, another scientific revolution is unfolding. Systems biology draws from the chaos and complexity theory and applies computational models to predict emerging behavior of the interactions between genes, gene products, and environmental factors. Adaptation of systems biology to translational and clinical sciences has been termed network medicine, and is likely to change the way we think about preventing, predicting, diagnosing, and treating complex human diseases. Network medicine finds gene-disease associations by analyzing the unparalleled digital information discovered and created by high-throughput technologies (dubbed as "omics" science) and links genetic variance to clinical disease phenotypes through intermediate organizational levels of life such as the epigenome, transcriptome, proteome, and metabolome. Supported by large reference databases, unprecedented data storage capacity, and innovative computational analysis, network medicine is poised to find links between conditions that were thought to be distinct, uncover shared disease mechanisms and key drivers of the pathogenesis, predict individual disease outcomes and trajectories, identify novel therapeutic applications, and help avoid off-target and undesirable drug effects. Recent advances indicate that these perspectives are increasingly within our reach for understanding and managing complex diseases of the digestive system. Copyright © 2013 AGA Institute. Published by Elsevier Inc. All rights reserved.
Computational clustering for viral reference proteomes
Chen, Chuming; Huang, Hongzhan; Mazumder, Raja; Natale, Darren A.; McGarvey, Peter B.; Zhang, Jian; Polson, Shawn W.; Wang, Yuqi; Wu, Cathy H.
2016-01-01
Motivation: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. Results: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt’s curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. Availability and implementation: http://proteininformationresource.org/rps/viruses/ Contact: chenc@udel.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153712
Thiele, Thomas; Steil, Leif; Völker, Uwe; Greinacher, Andreas
2007-01-01
Blood-based therapeutics are cellular or plasma components derived from human blood. Their production requires appropriate selection and treatment of the donor and processing of cells or plasma proteins. In contrast to clearly defined, chemically synthesized drugs, blood-derived therapeutics are highly complex mixtures of plasma proteins or even more complex cells. Pathogen transmission by the product as well as changes in the integrity of blood constituents resulting in loss of function or immune modulation are currently important issues in transfusion medicine. Protein modifications can occur during various steps of the production process, such as acquisition, enrichment of separate components (e.g. coagulation factors, cell populations), virus inactivation, conservation, and storage. Contemporary proteomic strategies allow a comprehensive assessment of protein modifications with high coverage, offer capabilities for qualitative and even quantitative analysis, and for high-throughput protein identification. Traditionally, proteomics approaches predominantly relied on two-dimensional gel electrophoresis (2-DE). Even if 2-DE is still state of the art, it has inherent limitations that are mainly based on the physicochemical properties of the proteins analyzed; for example, proteins with extremes in molecular mass and hydrophobicity (most membrane proteins) are difficult to assess by 2-DE. These limitations have fostered the development of mass spectrometry centered on non-gel-based separation approaches, which have proven to be highly successful and are thus complementing and even partially replacing 2-DE-based approaches. Although blood constituents have been extensively analyzed by proteomics, this technology has not been widely applied to assess or even improve blood-derived therapeutics, or to monitor the production processes. As proteomic technologies have the capacity to provide comprehensive information about changes occurring during processing and storage of blood products, proteomics can potentially guide improvement of pathogen inactivation procedures and engineering of stem cells, and may also allow a better understanding of factors influencing the immunogenicity of blood-derived therapeutics. An important development in proteomics is the reduction of inter-assay variability. This now allows the screening of samples taken from the same product over time or before and after processing. Optimized preparation procedures and storage conditions will reduce the risk of protein alterations, which in turn may contribute to better recovery, reduced exposure to allogeneic proteins, and increased transfusion safety.
DAnTE: a statistical tool for quantitative analysis of –omics data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Polpitiya, Ashoka D.; Qian, Weijun; Jaitly, Navdeep
2008-05-03
DAnTE (Data Analysis Tool Extension) is a statistical tool designed to address challenges unique to quantitative bottom-up, shotgun proteomics data. This tool has also been demonstrated for microarray data and can easily be extended to other high-throughput data types. DAnTE features selected normalization methods, missing value imputation algorithms, peptide to protein rollup methods, an extensive array of plotting functions, and a comprehensive ANOVA scheme that can handle unbalanced data and random effects. The Graphical User Interface (GUI) is designed to be very intuitive and user friendly.
Palazzotto, Emilia; Weber, Tilmann
2018-04-12
Natural products produced by microorganisms represent the main source of bioactive molecules. The development of high-throughput (omics) techniques have importantly contributed to the renaissance of new antibiotic discovery increasing our understanding of complex mechanisms controlling the expression of biosynthetic gene clusters (BGCs) encoding secondary metabolites. In this context this review highlights recent progress in the use and integration of 'omics' approaches with focuses on genomics, transcriptomics, proteomics metabolomics meta-omics and combined omics as powerful strategy to discover new antibiotics. Copyright © 2018 Elsevier Ltd. All rights reserved.
GeneLab: NASA's Open Access, Collaborative Platform for Systems Biology and Space Medicine
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Thompson, Terri G.; Fogle, Homer W.; Rask, Jon C.; Coughlan, Joseph C.
2015-01-01
NASA is investing in GeneLab1 (http:genelab.nasa.gov), a multi-year effort to maximize utilization of the limited resources to conduct biological and medical research in space, principally aboard the International Space Station (ISS). High-throughput genomic, transcriptomic, proteomic or other omics analyses from experiments conducted on the ISS will be stored in the GeneLab Data Systems (GLDS), an open-science information system that will also include a biocomputation platform with collaborative science capabilities, to enable the discovery and validation of molecular networks.
An, Ji‐Yong; Meng, Fan‐Rong; Chen, Xing; Yan, Gui‐Ying; Hu, Ji‐Pu
2016-01-01
Abstract Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. PMID:27452983
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Vorontsov, Egor A.; Rensen, Elena; Prangishvili, David; Krupovic, Mart; Chamot-Rooke, Julia
2016-01-01
Protein post-translational methylation has been reported to occur in archaea, including members of the genus Sulfolobus, but has never been characterized on a proteome-wide scale. Among important Sulfolobus proteins carrying such modification are the chromatin proteins that have been described to be methylated on lysine side chains, resembling eukaryotic histones in that aspect. To get more insight into the extent of this modification and its dynamics during the different growth steps of the thermoacidophylic archaeon S. islandicus LAL14/1, we performed a global and deep proteomic analysis using a combination of high-throughput bottom-up and top-down approaches on a single high-resolution mass spectrometer. 1,931 methylation sites on 751 proteins were found by the bottom-up analysis, with methylation sites on 526 proteins monitored throughout three cell culture growth stages: early-exponential, mid-exponential, and stationary. The top-down analysis revealed 3,978 proteoforms arising from 681 proteins, including 292 methylated proteoforms, 85 of which were comprehensively characterized. Methylated proteoforms of the five chromatin proteins (Alba1, Alba2, Cren7, Sul7d1, Sul7d2) were fully characterized by a combination of bottom-up and top-down data. The top-down analysis also revealed an increase of methylation during cell growth for two chromatin proteins, which had not been evidenced by bottom-up. These results shed new light on the ubiquitous lysine methylation throughout the S. islandicus proteome. Furthermore, we found that S. islandicus proteins are frequently acetylated at the N terminus, following the removal of the N-terminal methionine. This study highlights the great value of combining bottom-up and top-down proteomics for obtaining an unprecedented level of accuracy in detecting differentially modified intact proteoforms. The data have been deposited to the ProteomeXchange with identifiers PXD003074 and PXD004179. PMID:27555370
Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert
2010-06-01
Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further validated as the compartmentation of well known pathways (for instance, photosynthesis and amino acid, fatty acid, or glycerolipid biosynthesis) within chloroplasts could be dissected. It also allowed revisiting the compartmentation of the chloroplast metabolism and functions.
Tissue matrix arrays for high throughput screening and systems analysis of cell function
Beachley, Vince Z.; Wolf, Matthew T.; Sadtler, Kaitlyn; Manda, Srikanth S.; Jacobs, Heather; Blatchley, Michael; Bader, Joel S.; Pandey, Akhilesh; Pardoll, Drew; Elisseeff, Jennifer H.
2015-01-01
Cell and protein arrays have demonstrated remarkable utility in the high-throughput evaluation of biological responses; however, they lack the complexity of native tissue and organs. Here, we describe tissue extracellular matrix (ECM) arrays for screening biological outputs and systems analysis. We spotted processed tissue ECM particles as two-dimensional arrays or incorporated them with cells to generate three-dimensional cell-matrix microtissue arrays. We then investigated the response of human stem, cancer, and immune cells to tissue ECM arrays originating from 11 different tissues, and validated the 2D and 3D arrays as representative of the in vivo microenvironment through quantitative analysis of tissue-specific cellular responses, including matrix production, adhesion and proliferation, and morphological changes following culture. The biological outputs correlated with tissue proteomics, and network analysis identified several proteins linked to cell function. Our methodology enables broad screening of ECMs to connect tissue-specific composition with biological activity, providing a new resource for biomaterials research and translation. PMID:26480475
High-throughput microscopy must re-invent the microscope rather than speed up its functions
Oheim, M
2007-01-01
Knowledge gained from the revolutions in genomics and proteomics has helped to identify many of the key molecules involved in cellular signalling. Researchers, both in academia and in the pharmaceutical industry, now screen, at a sub-cellular level, where and when these proteins interact. Fluorescence imaging and molecular labelling combine to provide a powerful tool for real-time functional biochemistry with molecular resolution. However, they traditionally have been work-intensive, required trained personnel, and suffered from low through-put due to sample preparation, loading and handling. The need for speeding up microscopy is apparent from the tremendous complexity of cellular signalling pathways, the inherent biological variability, as well as the possibility that the same molecule plays different roles in different sub-cellular compartments. Research institutes and companies have teamed up to develop imaging cytometers of ever-increasing complexity. However, to truly go high-speed, sub-cellular imaging must free itself from the rigid framework of current microscopes. PMID:17603553
Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data
NASA Astrophysics Data System (ADS)
Trifonov, Vladimir; Pasqualucci, Laura; Dalla-Favera, Riccardo; Rabadan, Raul
2011-12-01
Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.
Automated solid-phase subcloning based on beads brought into proximity by magnetic force.
Hudson, Elton P; Nikoshkov, Andrej; Uhlen, Mathias; Rockberg, Johan
2012-01-01
In the fields of proteomics, metabolic engineering and synthetic biology there is a need for high-throughput and reliable cloning methods to facilitate construction of expression vectors and genetic pathways. Here, we describe a new approach for solid-phase cloning in which both the vector and the gene are immobilized to separate paramagnetic beads and brought into proximity by magnetic force. Ligation events were directly evaluated using fluorescent-based microscopy and flow cytometry. The highest ligation efficiencies were obtained when gene- and vector-coated beads were brought into close contact by application of a magnet during the ligation step. An automated procedure was developed using a laboratory workstation to transfer genes into various expression vectors and more than 95% correct clones were obtained in a number of various applications. The method presented here is suitable for efficient subcloning in an automated manner to rapidly generate a large number of gene constructs in various vectors intended for high throughput applications.
Automated Solid-Phase Subcloning Based on Beads Brought into Proximity by Magnetic Force
Hudson, Elton P.; Nikoshkov, Andrej; Uhlen, Mathias; Rockberg, Johan
2012-01-01
In the fields of proteomics, metabolic engineering and synthetic biology there is a need for high-throughput and reliable cloning methods to facilitate construction of expression vectors and genetic pathways. Here, we describe a new approach for solid-phase cloning in which both the vector and the gene are immobilized to separate paramagnetic beads and brought into proximity by magnetic force. Ligation events were directly evaluated using fluorescent-based microscopy and flow cytometry. The highest ligation efficiencies were obtained when gene- and vector-coated beads were brought into close contact by application of a magnet during the ligation step. An automated procedure was developed using a laboratory workstation to transfer genes into various expression vectors and more than 95% correct clones were obtained in a number of various applications. The method presented here is suitable for efficient subcloning in an automated manner to rapidly generate a large number of gene constructs in various vectors intended for high throughput applications. PMID:22624028
Unparalleled sample treatment throughput for proteomics workflows relying on ultrasonic energy.
Jorge, Susana; Araújo, J E; Pimentel-Santos, F M; Branco, Jaime C; Santos, Hugo M; Lodeiro, Carlos; Capelo, J L
2018-02-01
We report on the new microplate horn ultrasonic device as a powerful tool to speed proteomics workflows with unparalleled throughput. 96 complex proteomes were digested at the same time in 4min. Variables such as ultrasonication time, ultrasonication amplitude, and protein to enzyme ratio were optimized. The "classic" method relying on overnight protein digestion (12h) and the sonoreactor-based method were also employed for comparative purposes. We found the protein digestion efficiency homogeneously distributed in the entire microplate horn surface using the following conditions: 4min sonication time and 25% amplitude. Using this approach, patients with lymphoma and myeloma were classified using principal component analysis and a 2D gel-mass spectrometry based approach. Furthermore, we demonstrate the excellent performance by using MALDI-mass spectrometry based profiling as a fast way to classify patients with rheumatoid arthritis, systemic lupus erythematosus, and ankylosing spondylitis. Finally, the speed and simplicity of this method were demonstrated by clustering 90 patients with knee osteoarthritis disease (30), with a prosthesis (30, control group) and healthy individuals (30) with no history of joint disease. Overall, the new approach allows profiling a disease in just one week while allows to match the minimalism rules as outlined by Halls. Copyright © 2017 Elsevier B.V. All rights reserved.
Computational Tools for Stem Cell Biology
Bian, Qin; Cahan, Patrick
2016-01-01
For over half a century, the field of developmental biology has leveraged computation to explore mechanisms of developmental processes. More recently, computational approaches have been critical in the translation of high throughput data into knowledge of both developmental and stem cell biology. In the last several years, a new sub-discipline of computational stem cell biology has emerged that synthesizes the modeling of systems-level aspects of stem cells with high-throughput molecular data. In this review, we provide an overview of this new field and pay particular attention to the impact that single-cell transcriptomics is expected to have on our understanding of development and our ability to engineer cell fate. PMID:27318512
Computational Tools for Stem Cell Biology.
Bian, Qin; Cahan, Patrick
2016-12-01
For over half a century, the field of developmental biology has leveraged computation to explore mechanisms of developmental processes. More recently, computational approaches have been critical in the translation of high throughput data into knowledge of both developmental and stem cell biology. In the past several years, a new subdiscipline of computational stem cell biology has emerged that synthesizes the modeling of systems-level aspects of stem cells with high-throughput molecular data. In this review, we provide an overview of this new field and pay particular attention to the impact that single cell transcriptomics is expected to have on our understanding of development and our ability to engineer cell fate. Copyright © 2016 Elsevier Ltd. All rights reserved.
Quantifying protein-protein interactions in high throughput using protein domain microarrays.
Kaushansky, Alexis; Allen, John E; Gordus, Andrew; Stiffler, Michael A; Karp, Ethan S; Chang, Bryan H; MacBeath, Gavin
2010-04-01
Protein microarrays provide an efficient way to identify and quantify protein-protein interactions in high throughput. One drawback of this technique is that proteins show a broad range of physicochemical properties and are often difficult to produce recombinantly. To circumvent these problems, we have focused on families of protein interaction domains. Here we provide protocols for constructing microarrays of protein interaction domains in individual wells of 96-well microtiter plates, and for quantifying domain-peptide interactions in high throughput using fluorescently labeled synthetic peptides. As specific examples, we will describe the construction of microarrays of virtually every human Src homology 2 (SH2) and phosphotyrosine binding (PTB) domain, as well as microarrays of mouse PDZ domains, all produced recombinantly in Escherichia coli. For domains that mediate high-affinity interactions, such as SH2 and PTB domains, equilibrium dissociation constants (K(D)s) for their peptide ligands can be measured directly on arrays by obtaining saturation binding curves. For weaker binding domains, such as PDZ domains, arrays are best used to identify candidate interactions, which are then retested and quantified by fluorescence polarization. Overall, protein domain microarrays provide the ability to rapidly identify and quantify protein-ligand interactions with minimal sample consumption. Because entire domain families can be interrogated simultaneously, they provide a powerful way to assess binding selectivity on a proteome-wide scale and provide an unbiased perspective on the connectivity of protein-protein interaction networks.
Single-molecule protein sequencing through fingerprinting: computational assessment
NASA Astrophysics Data System (ADS)
Yao, Yao; Docter, Margreet; van Ginkel, Jetty; de Ridder, Dick; Joo, Chirlmin
2015-10-01
Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.
Jacob, Laurent; Combes, Florence; Burger, Thomas
2018-06-18
We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.
Han, Bomie; Higgs, Richard E
2008-09-01
High-throughput HPLC-mass spectrometry (HPLC-MS) is routinely used to profile biological samples for potential protein markers of disease, drug efficacy and toxicity. The discovery technology has advanced to the point where translating hypotheses from proteomic profiling studies into clinical use is the bottleneck to realizing the full potential of these approaches. The first step in this translation is the development and analytical validation of a higher throughput assay with improved sensitivity and selectivity relative to typical profiling assays. Multiple reaction monitoring (MRM) assays are an attractive approach for this stage of biomarker development given their improved sensitivity and specificity, the speed at which the assays can be developed and the quantitative nature of the assay. While the profiling assays are performed with ion trap mass spectrometers, MRM assays are traditionally developed in quadrupole-based mass spectrometers. Development of MRM assays from the same instrument used in the profiling analysis enables a seamless and rapid transition from hypothesis generation to validation. This report provides guidelines for rapidly developing an MRM assay using the same mass spectrometry platform used for profiling experiments (typically ion traps) and reviews methodological and analytical validation considerations. The analytical validation guidelines presented are drawn from existing practices on immunological assays and are applicable to any mass spectrometry platform technology.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Materials Databases Infrastructure Constructed by First Principles Calculations: A Review
Lin, Lianshan
2015-10-13
The First Principles calculations, especially the calculation based on High-Throughput Density Functional Theory, have been widely accepted as the major tools in atom scale materials design. The emerging super computers, along with the powerful First Principles calculations, have accumulated hundreds of thousands of crystal and compound records. The exponential growing of computational materials information urges the development of the materials databases, which not only provide unlimited storage for the daily increasing data, but still keep the efficiency in data storage, management, query, presentation and manipulation. This review covers the most cutting edge materials databases in materials design, and their hotmore » applications such as in fuel cells. By comparing the advantages and drawbacks of these high-throughput First Principles materials databases, the optimized computational framework can be identified to fit the needs of fuel cell applications. The further development of high-throughput DFT materials database, which in essence accelerates the materials innovation, is discussed in the summary as well.« less
Pérez, Vilma; Hengst, Martha; Kurte, Lenka; Dorador, Cristina; Jeffrey, Wade H.; Wattiez, Ruddy; Molina, Veronica; Matallana-Surget, Sabine
2017-01-01
Salar de Huasco, defined as a polyextreme environment, is a high altitude saline wetland in the Chilean Altiplano (3800 m.a.s.l.), permanently exposed to the highest solar radiation doses registered in the world. We present here the first comparative proteomics study of a photoheterotrophic bacterium, Rhodobacter sp., isolated from this remote and hostile habitat. We developed an innovative experimental approach using different sources of radiation (in situ sunlight and UVB lamps), cut-off filters (Mylar, Lee filters) and a high-throughput, label-free quantitative proteomics method to comprehensively analyze the effect of seven spectral bands on protein regulation. A hierarchical cluster analysis of 40 common proteins revealed that all conditions containing the most damaging UVB radiation induced similar pattern of protein regulation compared with UVA and visible light spectral bands. Moreover, it appeared that the cellular adaptation of Rhodobacter sp. to osmotic stress encountered in the hypersaline environment from which it was originally isolated, might further a higher resistance to damaging UV radiation. Indeed, proteins involved in the synthesis and transport of key osmoprotectants, such as glycine betaine and inositol, were found in very high abundance under UV radiation compared to the dark control, suggesting the function of osmolytes as efficient reactive oxygen scavengers. Our study also revealed a RecA-independent response and a tightly regulated network of protein quality control involving proteases and chaperones to selectively degrade misfolded and/or damaged proteins. PMID:28694800
Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A
2011-07-01
Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.
Byeon, Ji-Yeon; Bailey, Ryan C
2011-09-07
High affinity capture agents recognizing biomolecular targets are essential in the performance of many proteomic detection methods. Herein, we report the application of a label-free silicon photonic biomolecular analysis platform for simultaneously determining kinetic association and dissociation constants for two representative protein capture agents: a thrombin-binding DNA aptamer and an anti-thrombin monoclonal antibody. The scalability and inherent multiplexing capability of the technology make it an attractive platform for simultaneously evaluating the binding characteristics of multiple capture agents recognizing the same target antigen, and thus a tool complementary to emerging high-throughput capture agent generation strategies.
Plant Abiotic Stress Proteomics: The Major Factors Determining Alterations in Cellular Proteome
Kosová, Klára; Vítámvás, Pavel; Urban, Milan O.; Prášil, Ilja T.; Renaut, Jenny
2018-01-01
HIGHLIGHTS: Major environmental and genetic factors determining stress-related protein abundance are discussed.Major aspects of protein biological function including protein isoforms and PTMs, cellular localization and protein interactions are discussed.Functional diversity of protein isoforms and PTMs is discussed. Abiotic stresses reveal profound impacts on plant proteomes including alterations in protein relative abundance, cellular localization, post-transcriptional and post-translational modifications (PTMs), protein interactions with other protein partners, and, finally, protein biological functions. The main aim of the present review is to discuss the major factors determining stress-related protein accumulation and their final biological functions. A dynamics of stress response including stress acclimation to altered ambient conditions and recovery after the stress treatment is discussed. The results of proteomic studies aimed at a comparison of stress response in plant genotypes differing in stress adaptability reveal constitutively enhanced levels of several stress-related proteins (protective proteins, chaperones, ROS scavenging- and detoxification-related enzymes) in the tolerant genotypes with respect to the susceptible ones. Tolerant genotypes can efficiently adjust energy metabolism to enhanced needs during stress acclimation. Stress tolerance vs. stress susceptibility are relative terms which can reflect different stress-coping strategies depending on the given stress treatment. The role of differential protein isoforms and PTMs with respect to their biological functions in different physiological constraints (cellular compartments and interacting partners) is discussed. The importance of protein functional studies following high-throughput proteome analyses is presented in a broader context of plant biology. In summary, the manuscript tries to provide an overview of the major factors which have to be considered when interpreting data from proteomic studies on stress-treated plants. PMID:29472941
High Throughput Sequence Analysis for Disease Resistance in Maize
USDA-ARS?s Scientific Manuscript database
Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...
The focus of this meeting is the SAP's review and comment on the Agency's proposed high-throughput computational model of androgen receptor pathway activity as an alternative to the current Tier 1 androgen receptor assay (OCSPP 890.1150: Androgen Receptor Binding Rat Prostate Cyt...
The US EPA’s ToxCastTM program seeks to combine advances in high-throughput screening technology with methodologies from statistics and computer science to develop high-throughput decision support tools for assessing chemical hazard and risk. To develop new methods of analysis of...
High Performance Computing Modernization Program Kerberos Throughput Test Report
2017-10-26
functionality as Kerberos plugins. The pre -release production kit was used in these tests to compare against the current release kit. YubiKey support...HPCMP Kerberos Throughput Test Report 3 2. THROUGHPUT TESTING 2.1 Testing Components Throughput testing was done to determine the benefits of the pre ...both the current release kit and the pre -release production kit for a total of 378 individual tests in order to note any improvements. Based on work
PROTEOMICS IN ECOTOXICOLOGY: PROTEIN EXPRESSION PROFILING TO SCREEN CHEMICALS FOR ENDOCRINE ACTIVITY
Abstract for poster.
Current endocrine testing methods are animal intensive and lack the throughput necessary to screen large numbers of environmental chemicals for adverse effects. In this study, Matrix Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry...
Stepping into the omics era: Opportunities and challenges for biomaterials science and engineering☆
Rabitz, Herschel; Welsh, William J.; Kohn, Joachim; de Boer, Jan
2016-01-01
The research paradigm in biomaterials science and engineering is evolving from using low-throughput and iterative experimental designs towards high-throughput experimental designs for materials optimization and the evaluation of materials properties. Computational science plays an important role in this transition. With the emergence of the omics approach in the biomaterials field, referred to as materiomics, high-throughput approaches hold the promise of tackling the complexity of materials and understanding correlations between material properties and their effects on complex biological systems. The intrinsic complexity of biological systems is an important factor that is often oversimplified when characterizing biological responses to materials and establishing property-activity relationships. Indeed, in vitro tests designed to predict in vivo performance of a given biomaterial are largely lacking as we are not able to capture the biological complexity of whole tissues in an in vitro model. In this opinion paper, we explain how we reached our opinion that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. PMID:26876875
A community proposal to integrate proteomics activities in ELIXIR.
Vizcaíno, Juan Antonio; Walzer, Mathias; Jiménez, Rafael C; Bittremieux, Wout; Bouyssié, David; Carapito, Christine; Corrales, Fernando; Ferro, Myriam; Heck, Albert J R; Horvatovich, Peter; Hubalek, Martin; Lane, Lydie; Laukens, Kris; Levander, Fredrik; Lisacek, Frederique; Novak, Petr; Palmblad, Magnus; Piovesan, Damiano; Pühler, Alfred; Schwämmle, Veit; Valkenborg, Dirk; van Rijswijk, Merlijn; Vondrasek, Jiri; Eisenacher, Martin; Martens, Lennart; Kohlbacher, Oliver
2017-01-01
Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on 'The Future of Proteomics in ELIXIR' that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR's existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper.
A community proposal to integrate proteomics activities in ELIXIR
Vizcaíno, Juan Antonio; Walzer, Mathias; Jiménez, Rafael C.; Bittremieux, Wout; Bouyssié, David; Carapito, Christine; Corrales, Fernando; Ferro, Myriam; Heck, Albert J.R.; Horvatovich, Peter; Hubalek, Martin; Lane, Lydie; Laukens, Kris; Levander, Fredrik; Lisacek, Frederique; Novak, Petr; Palmblad, Magnus; Piovesan, Damiano; Pühler, Alfred; Schwämmle, Veit; Valkenborg, Dirk; van Rijswijk, Merlijn; Vondrasek, Jiri; Eisenacher, Martin; Martens, Lennart; Kohlbacher, Oliver
2017-01-01
Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on ‘The Future of Proteomics in ELIXIR’ that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR’s existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper. PMID:28713550
Cloud Computing for Protein-Ligand Binding Site Comparison
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery. PMID:23762824
Cloud computing for protein-ligand binding site comparison.
Hung, Che-Lun; Hua, Guan-Jie
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery.
neXtProt: organizing protein knowledge in the context of human proteome projects.
Gaudet, Pascale; Argoud-Puy, Ghislaine; Cusin, Isabelle; Duek, Paula; Evalet, Olivier; Gateau, Alain; Gleizes, Anne; Pereira, Mario; Zahn-Zabal, Monique; Zwahlen, Catherine; Bairoch, Amos; Lane, Lydie
2013-01-04
About 5000 (25%) of the ~20400 human protein-coding genes currently lack any experimental evidence at the protein level. For many others, there is only little information relative to their abundance, distribution, subcellular localization, interactions, or cellular functions. The aim of the HUPO Human Proteome Project (HPP, www.thehpp.org ) is to collect this information for every human protein. HPP is based on three major pillars: mass spectrometry (MS), antibody/affinity capture reagents (Ab), and bioinformatics-driven knowledge base (KB). To meet this objective, the Chromosome-Centric Human Proteome Project (C-HPP) proposes to build this catalog chromosome-by-chromosome ( www.c-hpp.org ) by focusing primarily on proteins that currently lack MS evidence or Ab detection. These are termed "missing proteins" by the HPP consortium. The lack of observation of a protein can be due to various factors including incorrect and incomplete gene annotation, low or restricted expression, or instability. neXtProt ( www.nextprot.org ) is a new web-based knowledge platform specific for human proteins that aims to complement UniProtKB/Swiss-Prot ( www.uniprot.org ) with detailed information obtained from carefully selected high-throughput experiments on genomic variation, post-translational modifications, as well as protein expression in tissues and cells. This article describes how neXtProt contributes to prioritize C-HPP efforts and integrates C-HPP results with other research efforts to create a complete human proteome catalog.
ODA, TEIJI; YAMAGUCHI, AKANE; YOKOYAMA, MASAO; SHIMIZU, KOJI; TOYOTA, KOSAKU; NIKAI, TETSURO; MATSUMOTO, KEN-ICHI
2014-01-01
Deep hypothermic circulatory arrest (DHCA) is a protective method against brain ischemia in aortic surgery. However, the possible effects of DHCA on the plasma proteins remain to be determined. In the present study, we used novel high-throughput technology to compare the plasma proteomes during DHCA (22°C) with selective cerebral perfusion (SCP, n=7) to those during normothermic cardiopulmonary bypass (CPB, n=7). Three plasma samples per patient were obtained during CPB: T1, prior to cooling; T2, during hypothermia; T3, after rewarming for the DHCA group and three corresponding points for the normothermic group. A proteomic analysis was performed using isobaric tag for relative and absolute quantification (iTRAQ) labeling tandem mass spectrometry to assess quantitative protein changes. In total, the analysis identified 262 proteins. The bioinformatics analysis revealed a significant upregulation of complement activation at T2 in normothermic CPB, which was suppressed in DHCA. These findings were confirmed by the changes of the terminal complement complex (SC5b-9) levels. At T3, however, the level of SC5b-9 showed a greater increase in DHCA compared to normothermic CPB, while 48 proteins were significantly downregulated in DHCA. The results demonstrated that DHCA and rewarming potentially exert a significant effect on the plasma proteome in patients undergoing aortic surgery. PMID:25050567
SwissPalm: Protein Palmitoylation database.
Blanc, Mathieu; David, Fabrice; Abrami, Laurence; Migliozzi, Daniel; Armand, Florence; Bürgi, Jérôme; van der Goot, Françoise Gisou
2015-01-01
Protein S-palmitoylation is a reversible post-translational modification that regulates many key biological processes, although the full extent and functions of protein S-palmitoylation remain largely unexplored. Recent developments of new chemical methods have allowed the establishment of palmitoyl-proteomes of a variety of cell lines and tissues from different species. As the amount of information generated by these high-throughput studies is increasing, the field requires centralization and comparison of this information. Here we present SwissPalm ( http://swisspalm.epfl.ch), our open, comprehensive, manually curated resource to study protein S-palmitoylation. It currently encompasses more than 5000 S-palmitoylated protein hits from seven species, and contains more than 500 specific sites of S-palmitoylation. SwissPalm also provides curated information and filters that increase the confidence in true positive hits, and integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments. Systems analysis of the palmitoyl-proteome screens indicate that 10% or more of the human proteome is susceptible to S-palmitoylation. Moreover, ontology and pathway analyses of the human palmitoyl-proteome reveal that key biological functions involve this reversible lipid modification. Comparative analysis finally shows a strong crosstalk between S-palmitoylation and other post-translational modifications. Through the compilation of data and continuous updates, SwissPalm will provide a powerful tool to unravel the global importance of protein S-palmitoylation.
SwissPalm: Protein Palmitoylation database
Abrami, Laurence; Migliozzi, Daniel; Armand, Florence; Bürgi, Jérôme; van der Goot, Françoise Gisou
2015-01-01
Protein S-palmitoylation is a reversible post-translational modification that regulates many key biological processes, although the full extent and functions of protein S-palmitoylation remain largely unexplored. Recent developments of new chemical methods have allowed the establishment of palmitoyl-proteomes of a variety of cell lines and tissues from different species. As the amount of information generated by these high-throughput studies is increasing, the field requires centralization and comparison of this information. Here we present SwissPalm ( http://swisspalm.epfl.ch), our open, comprehensive, manually curated resource to study protein S-palmitoylation. It currently encompasses more than 5000 S-palmitoylated protein hits from seven species, and contains more than 500 specific sites of S-palmitoylation. SwissPalm also provides curated information and filters that increase the confidence in true positive hits, and integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments. Systems analysis of the palmitoyl-proteome screens indicate that 10% or more of the human proteome is susceptible to S-palmitoylation. Moreover, ontology and pathway analyses of the human palmitoyl-proteome reveal that key biological functions involve this reversible lipid modification. Comparative analysis finally shows a strong crosstalk between S-palmitoylation and other post-translational modifications. Through the compilation of data and continuous updates, SwissPalm will provide a powerful tool to unravel the global importance of protein S-palmitoylation. PMID:26339475
Li, Min; Li, Lijuan; Wang, Ke; Su, Wenting; Jia, Jun; Wang, Xiaomin
2017-10-15
Electroacupuncture (EA) has been reported to alleviate motor deficits in Parkinson's disease (PD) patients, and PD animal models. However, the mechanisms by which EA improves motor function have not been investigated. We have employed a 6-hydroxydopamine (6-OHDA) unilateral injection induced PD model to investigate whether EA alters protein expression in the motor cortex. We found that 4weeks of EA treatment significantly improved spontaneous floor plane locomotion and rotarod performance. High-throughput proteomic analysis in the motor cortex was employed. The expression of 54 proteins were altered in the unlesioned motor cortex, and 102 protein expressions were altered in the lesioned motor cortex of 6-OHDA rats compared to sham rats. Compared to non-treatment PD control, EA treatment reversed 6 proteins in unlesioned and 19 proteins in lesioned motor cortex. The present study demonstrated that PD induces proteomic changes in the motor cortex, some of which are rescued by EA treatment. These targeted proteins were mainly involved in increasing autophagy, mRNA processing and ATP binding and maintaining the balance of neurotransmitters. Copyright © 2017 Elsevier B.V. All rights reserved.
Display technologies: application for the discovery of drug and gene delivery agents
Sergeeva, Anna; Kolonin, Mikhail G.; Molldrem, Jeffrey J.; Pasqualini, Renata; Arap, Wadih
2007-01-01
Recognition of molecular diversity of cell surface proteomes in disease is essential for the development of targeted therapies. Progress in targeted therapeutics requires establishing effective approaches for high-throughput identification of agents specific for clinically relevant cell surface markers. Over the past decade, a number of platform strategies have been developed to screen polypeptide libraries for ligands targeting receptors selectively expressed in the context of various cell surface proteomes. Streamlined procedures for identification of ligand-receptor pairs that could serve as targets in disease diagnosis, profiling, imaging and therapy have relied on the display technologies, in which polypeptides with desired binding profiles can be serially selected, in a process called biopanning, based on their physical linkage with the encoding nucleic acid. These technologies include virus/phage display, cell display, ribosomal display, mRNA display and covalent DNA display (CDT), with phage display being by far the most utilized. The scope of this review is the recent advancements in the display technologies with a particular emphasis on molecular mapping of cell surface proteomes with peptide phage display. Prospective applications of targeted compounds derived from display libraries in the discovery of targeted drugs and gene therapy vectors are discussed. PMID:17123658
Zhang, Jingshan; Maslov, Sergei; Shakhnovich, Eugene I
2008-01-01
Crowded intracellular environments present a challenge for proteins to form functional specific complexes while reducing non-functional interactions with promiscuous non-functional partners. Here we show how the need to minimize the waste of resources to non-functional interactions limits the proteome diversity and the average concentration of co-expressed and co-localized proteins. Using the results of high-throughput Yeast 2-Hybrid experiments, we estimate the characteristic strength of non-functional protein–protein interactions. By combining these data with the strengths of specific interactions, we assess the fraction of time proteins spend tied up in non-functional interactions as a function of their overall concentration. This allows us to sketch the phase diagram for baker's yeast cells using the experimentally measured concentrations and subcellular localization of their proteins. The positions of yeast compartments on the phase diagram are consistent with our hypothesis that the yeast proteome has evolved to operate closely to the upper limit of its size, whereas keeping individual protein concentrations sufficiently low to reduce non-functional interactions. These findings have implication for conceptual understanding of intracellular compartmentalization, multicellularity and differentiation. PMID:18682700
Breuer, Eun-Kyoung Yim; Murph, Mandi M.
2011-01-01
Technological and scientific innovations over the last decade have greatly contributed to improved diagnostics, predictive models, and prognosis among cancers affecting women. In fact, an explosion of information in these areas has almost assured future generations that outcomes in cancer will continue to improve. Herein we discuss the current status of breast, cervical, and ovarian cancers as it relates to screening, disease diagnosis, and treatment options. Among the differences in these cancers, it is striking that breast cancer has multiple predictive tests based upon tumor biomarkers and sophisticated, individualized options for prescription therapeutics while ovarian cancer lacks these tools. In addition, cervical cancer leads the way in innovative, cancer-preventative vaccines and multiple screening options to prevent disease progression. For each of these malignancies, emerging proteomic technologies based upon mass spectrometry, stable isotope labeling with amino acids, high-throughput ELISA, tissue or protein microarray techniques, and click chemistry in the pursuit of activity-based profiling can pioneer the next generation of discovery. We will discuss six of the latest techniques to understand proteomics in cancer and highlight research utilizing these techniques with the goal of improvement in the management of women's cancers. PMID:21886869
Whittington, Emma; Zhao, Qian; Borziak, Kirill; Walters, James R; Dorus, Steve
2015-07-01
The application of mass spectrometry based proteomics to sperm biology has greatly accelerated progress in understanding the molecular composition and function of spermatozoa. To date, these approaches have been largely restricted to model organisms, all of which produce a single sperm morph capable of oocyte fertilisation. Here we apply high-throughput mass spectrometry proteomic analysis to characterise sperm composition in Manduca sexta, the tobacco hornworm moth, which produce heteromorphic sperm, including one fertilisation competent (eupyrene) and one incompetent (apyrene) sperm type. This resulted in the high confidence identification of 896 proteins from a co-mixed sample of both sperm types, of which 167 are encoded by genes with strict one-to-one orthology in Drosophila melanogaster. Importantly, over half (55.1%) of these orthologous proteins have previously been identified in the D. melanogaster sperm proteome and exhibit significant conservation in quantitative protein abundance in sperm between the two species. Despite the complex nature of gene expression across spermatogenic stages, a significant correlation was also observed between sperm protein abundance and testis gene expression. Lepidopteran-specific sperm proteins (e.g., proteins with no homology to proteins in non-Lepidopteran taxa) were present in significantly greater abundance on average than those with homology outside the Lepidoptera. Given the disproportionate production of apyrene sperm (96% of all mature sperm in Manduca) relative to eupyrene sperm, these evolutionarily novel and highly abundant proteins are candidates for possessing apyrene-specific functions. Lastly, comparative genomic analyses of testis-expressed, ovary-expressed and sperm genes identified a concentration of novel sperm proteins shared amongst Lepidoptera of potential relevance to the evolutionary origin of heteromorphic spermatogenesis. As the first published Lepidopteran sperm proteome, this whole-cell proteomic characterisation will facilitate future evolutionary genetic and developmental studies of heteromorphic sperm production and parasperm function. Furthermore, the analyses presented here provide useful annotation information regarding sex-biased gene expression, novel Lepidopteran genes and gene function in the male gamete to complement the newly sequenced and annotated Manduca genome. Copyright © 2015 Elsevier Ltd. All rights reserved.
The mammary gland in domestic ruminants: a systems biology perspective.
Ferreira, Ana M; Bislev, Stine L; Bendixen, Emøke; Almeida, André M
2013-12-06
Milk and dairy products are central elements in the human diet. It is estimated that 108kg of milk per year are consumed per person worldwide. Therefore, dairy production represents a relevant fraction of the economies of many countries, being cattle, sheep, goat, water buffalo, and other ruminants the main species used worldwide. An adequate management of dairy farming cannot be achieved without the knowledge on the biological mechanisms behind lactation in ruminants. Thus, understanding the morphology, development and regulation of the mammary gland in health, disease and production is crucial. Presently, innovative and high-throughput technologies such as genomics, transcriptomics, proteomics and metabolomics allow a much broader and detailed knowledge on such issues. Additionally, the application of a systems biology approach to animal science is vastly growing, as new advances in one field of specialization or animal species lead to new lines of research in other areas or/and are expanded to other species. This article addresses how modern research approaches may help us understand long-known issues in mammary development, lactation biology and dairy production. Dairy production depends upon the knowledge of the morphology and regulation of the mammary gland and lactation. High-throughput technologies allow a much broader and detailed knowledge on the biology of the mammary gland. This paper reviews the major contributions that genomics, transcriptomics, metabolomics and proteomics approaches have provided to understand the regulation of the mammary gland in health, disease and production. In the context of mammary gland "omics"-based research, the integration of results using a Systems Biology Approach is of key importance. © 2013.
Ma, Hongyan; Delafield, Daniel G; Wang, Zhe; You, Jianlan; Wu, Si
2017-04-01
The microbial secretome, known as a pool of biomass (i.e., plant-based materials) degrading enzymes, can be utilized to discover industrial enzyme candidates for biofuel production. Proteomics approaches have been applied to discover novel enzyme candidates through comparing protein expression profiles with enzyme activity of the whole secretome under different growth conditions. However, the activity measurement of each enzyme candidate is needed for confident "active" enzyme assignments, which remains to be elucidated. To address this challenge, we have developed an Activity-Correlated Quantitative Proteomics Platform (ACPP) that systematically correlates protein-level enzymatic activity patterns and protein elution profiles using a label-free quantitative proteomics approach. The ACPP optimized a high performance anion exchange separation for efficiently fractionating complex protein samples while preserving enzymatic activities. The detected enzymatic activity patterns in sequential fractions using microplate-based assays were cross-correlated with protein elution profiles using a customized pattern-matching algorithm with a correlation R-score. The ACPP has been successfully applied to the identification of two types of "active" biomass-degrading enzymes (i.e., starch hydrolysis enzymes and cellulose hydrolysis enzymes) from Aspergillus niger secretome in a multiplexed fashion. By determining protein elution profiles of 156 proteins in A. niger secretome, we confidently identified the 1,4-α-glucosidase as the major "active" starch hydrolysis enzyme (R = 0.96) and the endoglucanase as the major "active" cellulose hydrolysis enzyme (R = 0.97). The results demonstrated that the ACPP facilitated the discovery of bioactive enzymes from complex protein samples in a high-throughput, multiplexing, and untargeted fashion. Graphical Abstract ᅟ.
Kirkwood, Kathryn J.; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I.
2013-01-01
Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community. PMID:24043423
Kirkwood, Kathryn J; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I
2013-12-01
Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community.
NASA Astrophysics Data System (ADS)
Ma, Hongyan; Delafield, Daniel G.; Wang, Zhe; You, Jianlan; Wu, Si
2017-04-01
The microbial secretome, known as a pool of biomass (i.e., plant-based materials) degrading enzymes, can be utilized to discover industrial enzyme candidates for biofuel production. Proteomics approaches have been applied to discover novel enzyme candidates through comparing protein expression profiles with enzyme activity of the whole secretome under different growth conditions. However, the activity measurement of each enzyme candidate is needed for confident "active" enzyme assignments, which remains to be elucidated. To address this challenge, we have developed an Activity-Correlated Quantitative Proteomics Platform (ACPP) that systematically correlates protein-level enzymatic activity patterns and protein elution profiles using a label-free quantitative proteomics approach. The ACPP optimized a high performance anion exchange separation for efficiently fractionating complex protein samples while preserving enzymatic activities. The detected enzymatic activity patterns in sequential fractions using microplate-based assays were cross-correlated with protein elution profiles using a customized pattern-matching algorithm with a correlation R-score. The ACPP has been successfully applied to the identification of two types of "active" biomass-degrading enzymes (i.e., starch hydrolysis enzymes and cellulose hydrolysis enzymes) from Aspergillus niger secretome in a multiplexed fashion. By determining protein elution profiles of 156 proteins in A. niger secretome, we confidently identified the 1,4-α-glucosidase as the major "active" starch hydrolysis enzyme (R = 0.96) and the endoglucanase as the major "active" cellulose hydrolysis enzyme (R = 0.97). The results demonstrated that the ACPP facilitated the discovery of bioactive enzymes from complex protein samples in a high-throughput, multiplexing, and untargeted fashion.
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
Accurate proteome-wide protein quantification from high-resolution 15N mass spectra
2011-01-01
In quantitative mass spectrometry-based proteomics, the metabolic incorporation of a single source of 15N-labeled nitrogen has many advantages over using stable isotope-labeled amino acids. However, the lack of a robust computational framework for analyzing the resulting spectra has impeded wide use of this approach. We have addressed this challenge by introducing a new computational methodology for analyzing 15N spectra in which quantification is integrated with identification. Application of this method to an Escherichia coli growth transition reveals significant improvement in quantification accuracy over previous methods. PMID:22182234
Evolving Relevance of Neuroproteomics in Alzheimer's Disease.
Lista, Simone; Zetterberg, Henrik; O'Bryant, Sid E; Blennow, Kaj; Hampel, Harald
2017-01-01
Substantial progress in the understanding of the biology of Alzheimer's disease (AD) has been achieved over the past decades. The early detection and diagnosis of AD and other age-related neurodegenerative diseases, however, remain a challenging scientific frontier. Therefore, the comprehensive discovery (relating to all individual, converging or diverging biochemical disease mechanisms), development, validation, and qualification of standardized biological markers with diagnostic and prognostic functions with a precise performance profile regarding specificity, sensitivity, and positive and negative predictive value are warranted.Methodological innovations in the area of exploratory high-throughput technologies, such as sequencing, microarrays, and mass spectrometry-based analyses of proteins/peptides, have led to the generation of large global molecular datasets from a multiplicity of biological systems, such as biological fluids, cells, tissues, and organs. Such methodological progress has shifted the attention to the execution of hypothesis-independent comprehensive exploratory analyses (opposed to the classical hypothesis-driven candidate approach), with the aim of fully understanding the biological systems in physiology and disease as a whole. The systems biology paradigm integrates experimental biology with accurate and rigorous computational modelling to describe and foresee the dynamic features of biological systems. The use of dynamically evolving technological platforms, including mass spectrometry, in the area of proteomics has enabled to rush the process of biomarker discovery and validation for refining significantly the diagnosis of AD. Currently, proteomics-which is part of the systems biology paradigm-is designated as one of the dominant matured sciences needed for the effective exploratory discovery of prospective biomarker candidates expected to play an effective role in aiding the early detection, diagnosis, prognosis, and therapy development in AD.
ERIC Educational Resources Information Center
da Silveira, Pedro Rodrigo Castro
2014-01-01
This thesis describes the development and deployment of a cyberinfrastructure for distributed high-throughput computations of materials properties at high pressures and/or temperatures--the Virtual Laboratory for Earth and Planetary Materials--VLab. VLab was developed to leverage the aggregated computational power of grid systems to solve…
Shapiro, John P; Komar, Hannah M; Hancioglu, Baris; Yu, Lianbo; Jin, Ming; Ogata, Yuko; Hart, Phil A; Cruz-Monserrate, Zobeida; Lesinski, Gregory B; Conwell, Darwin L
2017-01-01
Objectives: Chronic pancreatitis (CP) is characterized by inflammation and fibrosis of the pancreas, leading to pain, parenchymal damage, and loss of exocrine and endocrine function. There are currently no curative therapies; diagnosis remains difficult and aspects of pathogenesis remain unclear. Thus, there is a need to identify novel biomarkers to improve diagnosis and understand pathophysiology. We hypothesize that pancreatic acinar regions contain proteomic signatures relevant to disease processes, including secreted proteins that could be detected in biofluids. Methods: Acini from pancreata of mice injected with or without caerulein were collected using laser capture microdissection followed by mass spectrometry analysis. This protocol enabled high-throughput analysis that captured altered protein expression throughout the stages of CP. Results: Over 2,900 proteins were identified, whereas 331 were significantly changed ≥2-fold by mass spectrometry spectral count analysis. Consistent with pathogenesis, we observed increases in proteins related to fibrosis (e.g., collagen, P<0.001), several proteases (e.g., trypsin 1, P<0.001), and altered expression of proteins associated with diminished pancreas function (e.g., lipase, amylase, P<0.05). In comparison with proteomic data from a public data set of CP patients, a significant correlation was observed between proteomic changes in tissue from both the caerulein model and CP patients (r=0.725, P<0.001). CONCLUSIONS: This study illustrates the ability to characterize proteome changes of acinar cells isolated from pancreata of caerulein-treated mice and demonstrates a relationship between signatures from murine and human CP. PMID:28406494
Schmidlin, Thierry; Garrigues, Luc; Lane, Catherine S; Mulder, T Celine; van Doorn, Sander; Post, Harm; de Graaf, Erik L; Lemeer, Simone; Heck, Albert J R; Altelaar, A F Maarten
2016-08-01
Hypothesis-driven MS-based targeted proteomics has gained great popularity in a relatively short timespan. Next to the widely established selected reaction monitoring (SRM) workflow, data-independent acquisition (DIA), also referred to as sequential window acquisition of all theoretical spectra (SWATH) was introduced as a high-throughput targeted proteomics method. DIA facilitates increased proteome coverage, however, does not yet reach the sensitivity obtained with SRM. Therefore, a well-informed method selection is crucial for designing a successful targeted proteomics experiment. This is especially the case when targeting less conventional peptides such as those that contain PTMs, as these peptides do not always adhere to the optimal fragmentation considerations for targeted assays. Here, we provide insight into the performance of DIA, SRM, and MRM cubed (MRM(3) ) in the analysis of phosphorylation dynamics throughout the phosphoinositide 3-kinase mechanistic target of rapamycin (PI3K-mTOR) and mitogen-activated protein kinase (MAPK) signaling network. We observe indeed that DIA is less sensitive when compared to SRM, however demonstrates increased flexibility, by postanalysis selection of alternative phosphopeptide precursors. Additionally, we demonstrate the added benefit of MRM(3) , allowing the quantification of two poorly accessible phosphosites. In total, targeted proteomics enabled the quantification of 42 PI3K-mTOR and MAPK phosphosites, gaining a so far unachieved in-depth view mTOR signaling events linked to tyrosine kinase inhibitor resistance in non-small cell lung cancer. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Polci, Maria Letizia; Rossi, Stefania; Cordella, Martina; Carlucci, Giuseppe; Marchetti, Paolo; Antonini-Cappellini, Giancarlo; Facchiano, Antonio; D'Arcangelo, Daniela; Facchiano, Francesco
2013-01-01
Recently developed proteomic technologies allow to profile thousands of proteins within a high-throughput approach towards biomarker discovery, although results are not as satisfactory as expected. In the present study we demonstrate that serum proteome denaturation is a key underestimated feature; in fact, a new differential denaturation protocol better discriminates serum proteins according to their electrophoretic mobility as compared to single-denaturation protocols. Sixty nine different denaturation treatments were tested and the 3 most discriminating ones were selected (TRIDENT analysis) and applied to human sera, showing a significant improvement of serum protein discrimination as confirmed by MALDI-TOF/MS and LC-MS/MS identification, depending on the type of denaturation applied. Thereafter sera from mice and patients carrying cutaneous melanoma were analyzed through TRIDENT. Nine and 8 protein bands were found differentially expressed in mice and human melanoma sera, compared to healthy controls (p<0.05); three of them were found, for the first time, significantly modulated: α2macroglobulin (down-regulated in melanoma, p<0.001), Apolipoprotein-E and Apolipoprotein-A1 (both up-regulated in melanoma, p<0.04), both in mice and humans. The modulation was confirmed by immunological methods. Other less abundant proteins (e.g. gelsolin) were found significantly modulated (p<0.05). Conclusions: i) serum proteome contains a large amount of information, still neglected, related to proteins folding; ii) a careful serum denaturation may significantly improve analytical procedures involving complex protein mixtures; iii) serum differential denaturation protocol highlights interesting proteomic differences between cancer and healthy sera. PMID:23533572
Analysis of essential gene dynamics under antibiotic stress in Streptococcus sanguinis
El-Rami, Fadi; Kong, Xiangzhen; Parikh, Hardik; Zhu, Bin; Stone, Victoria; Kitten, Todd; Xu, Ping
2018-01-01
The paradoxical response of Streptococcus sanguinis to drugs prescribed for dental and clinical practices has complicated treatment guidelines and raised the need for further investigation. We conducted a high throughput study on concomitant transcriptome and proteome dynamics in a time course to assess S. sanguinis behaviour under a sub-inhibitory concentration of ampicillin. Temporal changes at the transcriptome and proteome level were monitored to cover essential genes and proteins over a physiological map of intricate pathways. Our findings revealed that translation was the functional category in S. sanguinis that was most enriched in essential proteins. Moreover, essential proteins in this category demonstrated the greatest conservation across 2774 bacterial proteomes, in comparison to other essential functional categories like cell wall biosynthesis and energy production. In comparison to non-essential proteins, essential proteins were less likely to contain ‘degradation-prone’ amino acids at their N-terminal position, suggesting a longer half-life. Despite the ampicillin-induced stress, the transcriptional up-regulation of amino acid-tRNA synthetases and proteomic elevation of amino acid biosynthesis enzymes favoured the enriched components of essential proteins revealing ‘proteomic signatures’ that can be used to bridge the genotype–phenotype gap of S. sanguinis under ampicillin stress. Furthermore, we identified a significant correlation between the levels of mRNA and protein for essential genes and detected essential protein-enriched pathways differentially regulated through a persistent stress response pattern at late time points. We propose that the current findings will help characterize a bacterial model to study the dynamics of essential genes and proteins under clinically relevant stress conditions. PMID:29393020
Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong
2005-01-01
Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.
Microarray profiling of chemical-induced effects is being increasingly used in medium and high-throughput formats. In this study, we describe computational methods to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), ...
Efficient and accurate adverse outcome pathway (AOP) based high-throughput screening (HTS) methods use a systems biology based approach to computationally model in vitro cellular and molecular data for rapid chemical prioritization; however, not all HTS assays are grounded by rel...
Gupta, Surya; De Puysseleyr, Veronic; Van der Heyden, José; Maddelein, Davy; Lemmens, Irma; Lievens, Sam; Degroeve, Sven; Tavernier, Jan; Martens, Lennart
2017-05-01
Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge. We have therefore built the MAPPIT cell microArray Protein Protein Interaction-Data management & Analysis Tool (MAPPI-DAT) as an automated data management and analysis tool for MAPPIT cell microarray experiments. MAPPI-DAT stores the experimental data and metadata in a systematic and structured way, automates data analysis and interpretation, and enables the meta-analysis of MAPPIT cell microarray data across all stored experiments. MAPPI-DAT is developed in Python, using R for data analysis and MySQL as data management system. MAPPI-DAT is cross-platform and can be ran on Microsoft Windows, Linux and OS X/macOS. The source code and a Microsoft Windows executable are freely available under the permissive Apache2 open source license at https://github.com/compomics/MAPPI-DAT. jan.tavernier@vib-ugent.be or lennart.martens@vib-ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Reddy, Jithender G; Kumar, Dinesh; Hosur, Ramakrishna V
2015-02-01
Protein NMR spectroscopy has expanded dramatically over the last decade into a powerful tool for the study of their structure, dynamics, and interactions. The primary requirement for all such investigations is sequence-specific resonance assignment. The demand now is to obtain this information as rapidly as possible and in all types of protein systems, stable/unstable, soluble/insoluble, small/big, structured/unstructured, and so on. In this context, we introduce here two reduced dimensionality experiments – (3,2)D-hNCOcanH and (3,2)D-hNcoCAnH – which enhance the previously described 2D NMR-based assignment methods quite significantly. Both the experiments can be recorded in just about 2-3 h each and hence would be of immense value for high-throughput structural proteomics and drug discovery research. The applicability of the method has been demonstrated using alpha-helical bovine apo calbindin-D9k P43M mutant (75 aa) protein. Automated assignment of this data using AUTOBA has been presented, which enhances the utility of these experiments. The backbone resonance assignments so derived are utilized to estimate secondary structures and the backbone fold using Web-based algorithms. Taken together, we believe that the method and the protocol proposed here can be used for routine high-throughput structural studies of proteins. Copyright © 2014 John Wiley & Sons, Ltd.
toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research
Rhee, David B.; Croken, Matthew McKnight; Shieh, Kevin R.; Sullivan, Julie; Micklem, Gos; Kim, Kami; Golden, Aaron
2015-01-01
Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. Database URL: http://toxomine.org PMID:26130662
Image Harvest: an open-source platform for high-throughput plant image processing and analysis
Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal
2016-01-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917
Computational Approaches to Phenotyping
Lussier, Yves A.; Liu, Yang
2007-01-01
The recent completion of the Human Genome Project has made possible a high-throughput “systems approach” for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype–phenotype associations, or “phenomic associations.” The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies. PMID:17202287
Proteomics and Metabolomics: Two Emerging Areas for Legume Improvement
Ramalingam, Abirami; Kudapa, Himabindu; Pazhamala, Lekha T.; Weckwerth, Wolfram; Varshney, Rajeev K.
2015-01-01
The crop legumes such as chickpea, common bean, cowpea, peanut, pigeonpea, soybean, etc. are important sources of nutrition and contribute to a significant amount of biological nitrogen fixation (>20 million tons of fixed nitrogen) in agriculture. However, the production of legumes is constrained due to abiotic and biotic stresses. It is therefore imperative to understand the molecular mechanisms of plant response to different stresses and identify key candidate genes regulating tolerance which can be deployed in breeding programs. The information obtained from transcriptomics has facilitated the identification of candidate genes for the given trait of interest and utilizing them in crop breeding programs to improve stress tolerance. However, the mechanisms of stress tolerance are complex due to the influence of multi-genes and post-transcriptional regulations. Furthermore, stress conditions greatly affect gene expression which in turn causes modifications in the composition of plant proteomes and metabolomes. Therefore, functional genomics involving various proteomics and metabolomics approaches have been obligatory for understanding plant stress tolerance. These approaches have also been found useful to unravel different pathways related to plant and seed development as well as symbiosis. Proteome and metabolome profiling using high-throughput based systems have been extensively applied in the model legume species, Medicago truncatula and Lotus japonicus, as well as in the model crop legume, soybean, to examine stress signaling pathways, cellular and developmental processes and nodule symbiosis. Moreover, the availability of protein reference maps as well as proteomics and metabolomics databases greatly support research and understanding of various biological processes in legumes. Protein-protein interaction techniques, particularly the yeast two-hybrid system have been advantageous for studying symbiosis and stress signaling in legumes. In this review, several studies on proteomics and metabolomics in model and crop legumes have been discussed. Additionally, applications of advanced proteomics and metabolomics approaches have also been included in this review for future applications in legume research. The integration of these “omics” approaches will greatly support the identification of accurate biomarkers in legume smart breeding programs. PMID:26734026
Mahadevan, Chidambareswaren; Jaleel, Abdul; Deb, Lokesh; Thomas, George; Sakuntala, Manjula
2015-01-01
Zingiber zerumbet (Zingiberaceae) is a wild, tropical medicinal herb that shows a high degree of resistance to diseases affecting cultivated ginger. Barley stripe mosaic virus (BSMV) silencing vectors containing an endogenous phytoene desaturase (PDS) gene fragment were agroinfiltrated into young leaves of Z. zerumbet under controlled growth conditions to effect virus-induced gene silencing (VIGS). Infiltrated leaves as well as newly emerged leaves and tillers showed visual signs of PDS silencing after 30 days. Replication and systemic movement of the viral vectors in silenced plants were confirmed by RT-PCR. Real-time quantitative PCR analysis verified significant down-regulation of PDS transcripts in the silenced tissues. Label-free proteomic analysis was conducted in leaves with established PDS transcript down regulation and buffer-infiltrated (mock) leaves. A total of 474 proteins were obtained, which were up-regulated, down-regulated or modulated de novo during VIGS. Most of these proteins were localized to the chloroplast, as revealed by UniprotKB analysis, and among the up-regulated proteins there were abiotic stress responsive, photosynthetic, metabolic and membrane proteins. Moreover, the demonstration of viral proteins together with host proteins proved successful viral infection. We report for the first time the establishment of a high-throughput gene functional analysis platform using BSMV-mediated VIGS in Z. zerumbet, as well as proteomic changes associated with VIGS. PMID:25918840
RaftProt: mammalian lipid raft proteome database.
Shah, Anup; Chen, David; Boda, Akash R; Foster, Leonard J; Davis, Melissa J; Hill, Michelle M
2015-01-01
RaftProt (http://lipid-raft-database.di.uq.edu.au/) is a database of mammalian lipid raft-associated proteins as reported in high-throughput mass spectrometry studies. Lipid rafts are specialized membrane microdomains enriched in cholesterol and sphingolipids thought to act as dynamic signalling and sorting platforms. Given their fundamental roles in cellular regulation, there is a plethora of information on the size, composition and regulation of these membrane microdomains, including a large number of proteomics studies. To facilitate the mining and analysis of published lipid raft proteomics studies, we have developed a searchable database RaftProt. In addition to browsing the studies, performing basic queries by protein and gene names, searching experiments by cell, tissue and organisms; we have implemented several advanced features to facilitate data mining. To address the issue of potential bias due to biochemical preparation procedures used, we have captured the lipid raft preparation methods and implemented advanced search option for methodology and sample treatment conditions, such as cholesterol depletion. Furthermore, we have identified a list of high confidence proteins, and enabled searching only from this list of likely bona fide lipid raft proteins. Given the apparent biological importance of lipid raft and their associated proteins, this database would constitute a key resource for the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Spitzer, James D; Hupert, Nathaniel; Duckart, Jonathan; Xiong, Wei
2007-01-01
Community-based mass prophylaxis is a core public health operational competency, but staffing needs may overwhelm the local trained health workforce. Just-in-time (JIT) training of emergency staff and computer modeling of workforce requirements represent two complementary approaches to address this logistical problem. Multnomah County, Oregon, conducted a high-throughput point of dispensing (POD) exercise to test JIT training and computer modeling to validate POD staffing estimates. The POD had 84% non-health-care worker staff and processed 500 patients per hour. Post-exercise modeling replicated observed staff utilization levels and queue formation, including development and amelioration of a large medical evaluation queue caused by lengthy processing times and understaffing in the first half-hour of the exercise. The exercise confirmed the feasibility of using JIT training for high-throughput antibiotic dispensing clinics staffed largely by nonmedical professionals. Patient processing times varied over the course of the exercise, with important implications for both staff reallocation and future POD modeling efforts. Overall underutilization of staff revealed the opportunity for greater efficiencies and even higher future throughputs.
High-throughput sample adaptive offset hardware architecture for high-efficiency video coding
NASA Astrophysics Data System (ADS)
Zhou, Wei; Yan, Chang; Zhang, Jingzhi; Zhou, Xin
2018-03-01
A high-throughput hardware architecture for a sample adaptive offset (SAO) filter in the high-efficiency video coding video coding standard is presented. First, an implementation-friendly and simplified bitrate estimation method of rate-distortion cost calculation is proposed to reduce the computational complexity in the mode decision of SAO. Then, a high-throughput VLSI architecture for SAO is presented based on the proposed bitrate estimation method. Furthermore, multiparallel VLSI architecture for in-loop filters, which integrates both deblocking filter and SAO filter, is proposed. Six parallel strategies are applied in the proposed in-loop filters architecture to improve the system throughput and filtering speed. Experimental results show that the proposed in-loop filters architecture can achieve up to 48% higher throughput in comparison with prior work. The proposed architecture can reach a high-operating clock frequency of 297 MHz with TSMC 65-nm library and meet the real-time requirement of the in-loop filters for 8 K × 4 K video format at 132 fps.
Kavlock, Robert; Dix, David
2010-02-01
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, the U.S. Environmental Protection Agency EPA is developing robust and flexible computational tools that can be applied to the thousands of chemicals in commerce, and contaminant mixtures found in air, water, and hazardous-waste sites. The Office of Research and Development (ORD) Computational Toxicology Research Program (CTRP) is composed of three main elements. The largest component is the National Center for Computational Toxicology (NCCT), which was established in 2005 to coordinate research on chemical screening and prioritization, informatics, and systems modeling. The second element consists of related activities in the National Health and Environmental Effects Research Laboratory (NHEERL) and the National Exposure Research Laboratory (NERL). The third and final component consists of academic centers working on various aspects of computational toxicology and funded by the U.S. EPA Science to Achieve Results (STAR) program. Together these elements form the key components in the implementation of both the initial strategy, A Framework for a Computational Toxicology Research Program (U.S. EPA, 2003), and the newly released The U.S. Environmental Protection Agency's Strategic Plan for Evaluating the Toxicity of Chemicals (U.S. EPA, 2009a). Key intramural projects of the CTRP include digitizing legacy toxicity testing information toxicity reference database (ToxRefDB), predicting toxicity (ToxCast) and exposure (ExpoCast), and creating virtual liver (v-Liver) and virtual embryo (v-Embryo) systems models. U.S. EPA-funded STAR centers are also providing bioinformatics, computational toxicology data and models, and developmental toxicity data and models. The models and underlying data are being made publicly available through the Aggregated Computational Toxicology Resource (ACToR), the Distributed Structure-Searchable Toxicity (DSSTox) Database Network, and other U.S. EPA websites. While initially focused on improving the hazard identification process, the CTRP is placing increasing emphasis on using high-throughput bioactivity profiling data in systems modeling to support quantitative risk assessments, and in developing complementary higher throughput exposure models. This integrated approach will enable analysis of life-stage susceptibility, and understanding of the exposures, pathways, and key events by which chemicals exert their toxicity in developing systems (e.g., endocrine-related pathways). The CTRP will be a critical component in next-generation risk assessments utilizing quantitative high-throughput data and providing a much higher capacity for assessing chemical toxicity than is currently available.
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Zhong, Qing; Rüschoff, Jan H.; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J.; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J.; Rupp, Niels J.; Fankhauser, Christian; Buhmann, Joachim M.; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A.; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C.; Jochum, Wolfram; Wild, Peter J.
2016-01-01
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility. PMID:27052161
Zhong, Qing; Rüschoff, Jan H; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J; Rupp, Niels J; Fankhauser, Christian; Buhmann, Joachim M; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C; Jochum, Wolfram; Wild, Peter J
2016-04-07
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility.
High-Throughput Bit-Serial LDPC Decoder LSI Based on Multiple-Valued Asynchronous Interleaving
NASA Astrophysics Data System (ADS)
Onizawa, Naoya; Hanyu, Takahiro; Gaudet, Vincent C.
This paper presents a high-throughput bit-serial low-density parity-check (LDPC) decoder that uses an asynchronous interleaver. Since consecutive log-likelihood message values on the interleaver are similar, node computations are continuously performed by using the most recently arrived messages without significantly affecting bit-error rate (BER) performance. In the asynchronous interleaver, each message's arrival rate is based on the delay due to the wire length, so that the decoding throughput is not restricted by the worst-case latency, which results in a higher average rate of computation. Moreover, the use of a multiple-valued data representation makes it possible to multiplex control signals and data from mutual nodes, thus minimizing the number of handshaking steps in the asynchronous interleaver and eliminating the clock signal entirely. As a result, the decoding throughput becomes 1.3 times faster than that of a bit-serial synchronous decoder under a 90nm CMOS technology, at a comparable BER.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
Monitoring Peptidase Activities in Complex Proteomes by MALDI-TOF Mass Spectrometry
Villanueva, Josep; Nazarian, Arpi; Lawlor, Kevin; Tempst, Paul
2009-01-01
Measuring enzymatic activities in biological fluids is a form of activity-based proteomics and may be utilized as a means of developing disease biomarkers. Activity-based assays allow amplification of output signals, thus potentially visualizing low-abundant enzymes on a virtually transparent whole-proteome background. The protocol presented here describes a semi-quantitative in vitro assay of proteolytic activities in complex proteomes by monitoring breakdown of designer peptide-substrates using robotic extraction and a MALDI-TOF mass spectrometric read-out. Relative quantitation of the peptide metabolites is done by comparison with spiked internal standards, followed by statistical analysis of the resulting mini-peptidome. Partial automation provides reproducibility and throughput essential for comparing large sample sets. The approach may be employed for diagnostic or predictive purposes and enables profiling of 96 samples in 30 hours. It could be tailored to many diagnostic and pharmaco-dynamic purposes, as a read-out of catalytic and metabolic activities in body fluids or tissues. PMID:19617888
The future of targeted peptidomics.
Findeisen, Peter
2013-12-01
Targeted MS is becoming increasingly important for sensitive and specific quantitative detection of proteins and respective PTMs. In this article, Ceglarek et al. [Proteomics Clin. Appl. 2013, 7, 794-801] present an LC-MS-based method for simultaneous quantitation of seven apolipoproteins in serum specimens. The assay fulfills many necessities of routine diagnostic applications, namely, low cost, high throughput, and good reproducibility. We anticipate that validation of new biomarkers will speed up with this technology and the palette of laboratory-based diagnostic tools will hopefully be augmented significantly in the near future. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Alternative polyadenylation: New insights from global analyses
Shi, Yongsheng
2012-01-01
Recent studies have revealed widespread mRNA alternative polyadenylation (APA) in eukaryotes and its dynamic spatial and temporal regulation. APA not only generates proteomic and functional diversity, but also plays important roles in regulating gene expression. Global deregulation of APA has been demonstrated in a variety of human diseases. Recent exciting advances in the field have been made possible in a large part by high throughput analyses using newly developed experimental tools. Here I review the recent progress in global studies of APA and the insights that have emerged from these and other studies that use more conventional methods. PMID:23097429
[Development and Application of Metabonomics in Forensic Toxicology].
Yan, Hui; Shen, Min
2015-06-01
Metabonomics is an important branch of system biology following the development of genomics, transcriptomics and proteomics. It can perform high-throughput detection and data processing with multiple parameters, potentially enabling the identification and quantification of all small metabolites in a biological system. It can be used to provide comprehensive information on the toxicity effects, toxicological mechanisms and biomarkers, sensitively finding the unusual metabolic changes caused by poison. This article mainly reviews application of metabonomics in toxicological studies of abused drugs, pesticides, poisonous plants and poisonous animals, and also illustrates the new direction of forensic toxicology research.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.
Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617
Computational Toxicology at the US EPA
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, EPA is developin...
This week, we are excited to announce the launch of the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) Proteogenomics Computational DREAM Challenge. The aim of this Challenge is to encourage the generation of computational methods for extracting information from the cancer proteome and for linking those data to genomic and transcriptomic information. The specific goals are to predict proteomic and phosphoproteomic data from other multiple data types including transcriptomics and genetics.
The use of high-throughput in vitro assays has been proposed to play a significant role in the future of toxicity testing. In this study, rat hepatic metabolic clearance and plasma protein binding were measured for 59 ToxCast phase I chemicals. Computational in vitro-to-in vivo e...
High-throughput protein analysis integrating bioinformatics and experimental assays
del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan
2004-01-01
The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202
High throughput techniques to reveal the molecular physiology and evolution of digestion in spiders.
Fuzita, Felipe J; Pinkse, Martijn W H; Patane, José S L; Verhaert, Peter D E M; Lopes, Adriana R
2016-09-07
Spiders are known for their predatory efficiency and for their high capacity of digesting relatively large prey. They do this by combining both extracorporeal and intracellular digestion. Whereas many high throughput ("-omics") techniques focus on biomolecules in spider venom, so far this approach has not yet been applied to investigate the protein composition of spider midgut diverticula (MD) and digestive fluid (DF). We here report on our investigations of both MD and DF of the spider Nephilingis (Nephilengys) cruentata through the use of next generation sequencing and shotgun proteomics. This shows that the DF is composed of a variety of hydrolases including peptidases, carbohydrases, lipases and nuclease, as well as of toxins and regulatory proteins. We detect 25 astacins in the DF. Phylogenetic analysis of the corresponding transcript(s) in Arachnida suggests that astacins have acquired an unprecedented role for extracorporeal digestion in Araneae, with different orthologs used by each family. The results of a comparative study of spiders in distinct physiological conditions allow us to propose some digestion mechanisms in this interesting animal taxon. All the high throughput data allowed the demonstration that DF is a secretion originating from the MD. We identified enzymes involved in the extracellular and intracellular phases of digestion. Besides that, data analyses show a large gene duplication event in Araneae digestive process evolution, mainly of astacin genes. We were also able to identify proteins expressed and translated in the digestive system, which until now had been exclusively associated to venom glands.
High-throughput bioinformatics with the Cyrille2 pipeline system
Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ
2008-01-01
Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742
Anderson, Lissa C; DeHart, Caroline J; Kaiser, Nathan K; Fellers, Ryan T; Smith, Donald F; Greer, Joseph B; LeDuc, Richard D; Blakney, Greg T; Thomas, Paul M; Kelleher, Neil L; Hendrickson, Christopher L
2017-02-03
Successful high-throughput characterization of intact proteins from complex biological samples by mass spectrometry requires instrumentation capable of high mass resolving power, mass accuracy, sensitivity, and spectral acquisition rate. These limitations often necessitate the performance of hundreds of LC-MS/MS experiments to obtain reasonable coverage of the targeted proteome, which is still typically limited to molecular weights below 30 kDa. The National High Magnetic Field Laboratory (NHMFL) recently installed a 21 T FT-ICR mass spectrometer, which is part of the NHMFL FT-ICR User Facility and available to all qualified users. Here we demonstrate top-down LC-21 T FT-ICR MS/MS of intact proteins derived from human colorectal cancer cell lysate. We identified a combined total of 684 unique protein entries observed as 3238 unique proteoforms at a 1% false discovery rate, based on rapid, data-dependent acquisition of collision-induced and electron-transfer dissociation tandem mass spectra from just 40 LC-MS/MS experiments. Our identifications included 372 proteoforms with molecular weights over 30 kDa detected at isotopic resolution, which substantially extends the accessible mass range for high-throughput top-down LC-MS/MS.
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu
2016-10-01
Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Xu, Huilei; Baroukh, Caroline; Dannenfelser, Ruth; Chen, Edward Y; Tan, Christopher M; Kou, Yan; Kim, Yujin E; Lemischka, Ihor R; Ma'ayan, Avi
2013-01-01
High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Optimal selection of epitopes for TXP-immunoaffinity mass spectrometry.
Planatscher, Hannes; Supper, Jochen; Poetz, Oliver; Stoll, Dieter; Joos, Thomas; Templin, Markus F; Zell, Andreas
2010-06-25
Mass spectrometry (MS) based protein profiling has become one of the key technologies in biomedical research and biomarker discovery. One bottleneck in MS-based protein analysis is sample preparation and an efficient fractionation step to reduce the complexity of the biological samples, which are too complex to be analyzed directly with MS. Sample preparation strategies that reduce the complexity of tryptic digests by using immunoaffinity based methods have shown to lead to a substantial increase in throughput and sensitivity in the proteomic mass spectrometry approach. The limitation of using such immunoaffinity-based approaches is the availability of the appropriate peptide specific capture antibodies. Recent developments in these approaches, where subsets of peptides with short identical terminal sequences can be enriched using antibodies directed against short terminal epitopes, promise a significant gain in efficiency. We show that the minimal set of terminal epitopes for the coverage of a target protein list can be found by the formulation as a set cover problem, preceded by a filtering pipeline for the exclusion of peptides and target epitopes with undesirable properties. For small datasets (a few hundred proteins) it is possible to solve the problem to optimality with moderate computational effort using commercial or free solvers. Larger datasets, like full proteomes require the use of heuristics.
High-throughput search for caloric materials: the CaloriCool approach
NASA Astrophysics Data System (ADS)
Zarkevich, N. A.; Johnson, D. D.; Pecharsky, V. K.
2018-01-01
The high-throughput search paradigm adopted by the newly established caloric materials consortium—CaloriCool®—with the goal to substantially accelerate discovery and design of novel caloric materials is briefly discussed. We begin with describing material selection criteria based on known properties, which are then followed by heuristic fast estimates, ab initio calculations, all of which has been implemented in a set of automated computational tools and measurements. We also demonstrate how theoretical and computational methods serve as a guide for experimental efforts by considering a representative example from the field of magnetocaloric materials.
High-throughput search for caloric materials: the CaloriCool approach
Zarkevich, Nikolai A.; Johnson, Duane D.; Pecharsky, V. K.
2017-12-13
The high-throughput search paradigm adopted by the newly established caloric materials consortium—CaloriCool ®—with the goal to substantially accelerate discovery and design of novel caloric materials is briefly discussed. Here, we begin with describing material selection criteria based on known properties, which are then followed by heuristic fast estimates, ab initio calculations, all of which has been implemented in a set of automated computational tools and measurements. We also demonstrate how theoretical and computational methods serve as a guide for experimental efforts by considering a representative example from the field of magnetocaloric materials.
Matrix metalloproteinase proteomics: substrates, targets, and therapy.
Morrison, Charlotte J; Butler, Georgina S; Rodríguez, David; Overall, Christopher M
2009-10-01
Proteomics encompasses powerful techniques termed 'degradomics' for unbiased high-throughput protease substrate discovery screens that have been applied to an important family of extracellular proteases, the matrix metalloproteinases (MMPs). Together with the data generated from genetic deletion and transgenic mouse models and genomic profiling, these screens can uncover the diverse range of MMP functions, reveal which MMPs and MMP-mediated pathways exacerbate pathology, and which are involved in protection and the resolution of disease. This information can be used to identify and validate candidate drug targets and antitargets, and is critical for the development of new inhibitors of MMP function. Such inhibitors may target either the MMP directly in a specific manner or pathways upstream and downstream of MMP activity that are mediating deleterious effects in disease. Since MMPs do not operate alone but are part of the 'protease web', it is necessary to use system-wide approaches to understand MMP proteolysis in vivo, to discover new biological roles and their potential for therapeutic modification.
Integration of Proteomic, Transcriptional, and Interactome Data Reveals Hidden Signaling Components
Huang, Shao-shan Carol; Fraenkel, Ernest
2009-01-01
Cellular signaling and regulatory networks underlie fundamental biological processes such as growth, differentiation, and response to the environment. Although there are now various high-throughput methods for studying these processes, knowledge of them remains fragmentary. Typically, the vast majority of hits identified by transcriptional, proteomic, and genetic assays lie outside of the expected pathways. These unexpected components of the cellular response are often the most interesting, because they can provide new insights into biological processes and potentially reveal new therapeutic approaches. However, they are also the most difficult to interpret. We present a technique, based on the Steiner tree problem, that uses previously reported protein-protein and protein-DNA interactions to determine how these hits are organized into functionally coherent pathways, revealing many components of the cellular response that are not readily apparent in the original data. Applied simultaneously to phosphoproteomic and transcriptional data for the yeast pheromone response, it identifies changes in diverse cellular processes that extend far beyond the expected pathways. PMID:19638617
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome
Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.
2011-01-01
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880
A cell death assay for assessing the mitochondrial targeting of proteins.
Camara Teixeira, Daniel; Cordonier, Elizabeth L; Wijeratne, Subhashinee S K; Huebbe, Patricia; Jamin, Augusta; Jarecke, Sarah; Wiebe, Matthew; Zempleni, Janos
2018-06-01
The mitochondrial proteome comprises 1000 to 1500 proteins, in addition to proteins for which the mitochondrial localization is uncertain. About 800 diseases have been linked with mutations in mitochondrial proteins. We devised a cell survival assay for assessing the mitochondrial localization in a high-throughput format. This protocol allows us to assess the mitochondrial localization of proteins and their mutants, and to identify drugs and nutrients that modulate the mitochondrial targeting of proteins. The assay works equally well for proteins directed to the outer mitochondrial membrane, inner mitochondrial membrane mitochondrial and mitochondrial matrix, as demonstrated by assessing the mitochondrial targeting of the following proteins: carnitine palmitoyl transferase 1 (consensus sequence and R123C mutant), acetyl-CoA carboxylase 2, uncoupling protein 1 and holocarboxylase synthetase. Our screen may be useful for linking the mitochondrial proteome with rare diseases and for devising drug- and nutrition-based strategies for altering the mitochondrial targeting of proteins. Copyright © 2018 Elsevier Inc. All rights reserved.
Saha, Supriya K.; Gordan, John D.; Kleinstiver, Benjamin P.; Vu, Phuong; Najem, Mortada S.; Yeo, Jia-Chi; Shi, Lei; Kato, Yasutaka; Levin, Rebecca S.; Webber, James T.; Damon, Leah J.; Egan, Regina K.; Greninger, Patricia; McDermott, Ultan; Garnett, Mathew J.; Jenkins, Roger L.; Rieger-Christ, Kimberly M.; Sullivan, Travis B.; Hezel, Aram F.; Liss, Andrew S.; Mizukami, Yusuke; Goyal, Lipika; Ferrone, Cristina R.; Zhu, Andrew X.; Joung, J. Keith; Shokat, Kevan M.; Benes, Cyril H.; Bardeesy, Nabeel
2017-01-01
Intrahepatic cholangiocarcinoma (ICC) is an aggressive liver bile duct malignancy exhibiting frequent isocitrate dehydrogenase (IDH1/IDH2) mutations. Through a high-throughput drug screen of a large panel of cancer cell lines including 17 biliary tract cancers, we found that IDH mutant (IDHm) ICC cells demonstrate a striking response to the multi-kinase inhibitor dasatinib, with the highest sensitivity among 682 solid tumor cell lines. Using unbiased proteomics to capture the activated kinome and CRISPR/Cas9-based genome editing to introduce dasatinib-resistant ‘gatekeeper’ mutant kinases, we identified SRC as a critical dasatinib target in IDHm ICC. Importantly, dasatinib-treated IDHm xenografts exhibited pronounced apoptosis and tumor regression. Our results show that IDHm ICC cells have a unique dependency on SRC and suggest that dasatinib may have therapeutic benefit against IDHm ICC. Moreover, these proteomic and genome-editing strategies provide a systematic and broadly applicable approach to define targets of kinase inhibitors underlying drug responsiveness. PMID:27231123
Development of proteome-wide binding reagents for research and diagnostics.
Taussig, Michael J; Schmidt, Ronny; Cook, Elizabeth A; Stoevesandt, Oda
2013-12-01
Alongside MS, antibodies and other specific protein-binding molecules have a special place in proteomics as affinity reagents in a toolbox of applications for determining protein location, quantitative distribution and function (affinity proteomics). The realisation that the range of research antibodies available, while apparently vast is nevertheless still very incomplete and frequently of uncertain quality, has stimulated projects with an objective of raising comprehensive, proteome-wide sets of protein binders. With progress in automation and throughput, a remarkable number of recent publications refer to the practical possibility of selecting binders to every protein encoded in the genome. Here we review the requirements of a pipeline of production of protein binders for the human proteome, including target prioritisation, antigen design, 'next generation' methods, databases and the approaches taken by ongoing projects in Europe and the USA. While the task of generating affinity reagents for all human proteins is complex and demanding, the benefits of well-characterised and quality-controlled pan-proteome binder resources for biomedical research, industry and life sciences in general would be enormous and justify the effort. Given the technical, personnel and financial resources needed to fulfil this aim, expansion of current efforts may best be addressed through large-scale international collaboration. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Efficient Site-Specific Labeling of Proteins via Cysteines
Kim, Younggyu; Ho, Sam O.; Gassman, Natalie R.; Korlann, You; Landorf, Elizabeth V.; Collart, Frank R.; Weiss, Shimon
2011-01-01
Methods for chemical modifications of proteins have been crucial for the advancement of proteomics. In particular, site-specific covalent labeling of proteins with fluorophores and other moieties has permitted the development of a multitude of assays for proteome analysis. A common approach for such a modification is solvent-accessible cysteine labeling using thiol-reactive dyes. Cysteine is very attractive for site-specific conjugation due to its relative rarity throughout the proteome and the ease of its introduction into a specific site along the protein's amino acid chain. This is achieved by site-directed mutagenesis, most often without perturbing the protein's function. Bottlenecks in this reaction, however, include the maintenance of reactive thiol groups without oxidation before the reaction, and the effective removal of unreacted molecules prior to fluorescence studies. Here, we describe an efficient, specific, and rapid procedure for cysteine labeling starting from well-reduced proteins in the solid state. The efficacy and specificity of the improved procedure are estimated using a variety of single-cysteine proteins and thiol-reactive dyes. Based on UV/vis absorbance spectra, coupling efficiencies are typically in the range 70–90%, and specificities are better than ~95%. The labeled proteins are evaluated using fluorescence assays, proving that the covalent modification does not alter their function. In addition to maleimide-based conjugation, this improved procedure may be used for other thiol-reactive conjugations such as haloacetyl, alkyl halide, and disulfide interchange derivatives. This facile and rapid procedure is well suited for high throughput proteome analysis. PMID:18275130
Efficient site-specific labeling of proteins via cysteines.
Kim, Younggyu; Ho, Sam O; Gassman, Natalie R; Korlann, You; Landorf, Elizabeth V; Collart, Frank R; Weiss, Shimon
2008-03-01
Methods for chemical modifications of proteins have been crucial for the advancement of proteomics. In particular, site-specific covalent labeling of proteins with fluorophores and other moieties has permitted the development of a multitude of assays for proteome analysis. A common approach for such a modification is solvent-accessible cysteine labeling using thiol-reactive dyes. Cysteine is very attractive for site-specific conjugation due to its relative rarity throughout the proteome and the ease of its introduction into a specific site along the protein's amino acid chain. This is achieved by site-directed mutagenesis, most often without perturbing the protein's function. Bottlenecks in this reaction, however, include the maintenance of reactive thiol groups without oxidation before the reaction, and the effective removal of unreacted molecules prior to fluorescence studies. Here, we describe an efficient, specific, and rapid procedure for cysteine labeling starting from well-reduced proteins in the solid state. The efficacy and specificity of the improved procedure are estimated using a variety of single-cysteine proteins and thiol-reactive dyes. Based on UV/vis absorbance spectra, coupling efficiencies are typically in the range 70-90%, and specificities are better than approximately 95%. The labeled proteins are evaluated using fluorescence assays, proving that the covalent modification does not alter their function. In addition to maleimide-based conjugation, this improved procedure may be used for other thiol-reactive conjugations such as haloacetyl, alkyl halide, and disulfide interchange derivatives. This facile and rapid procedure is well suited for high throughput proteome analysis.
Proteomics for the authentication of fish species.
Mazzeo, Maria Fiorella; Siciliano, Rosa Anna
2016-09-16
Assessment of seafood authenticity and origin, mainly in the case of processed products (fillets, sticks, baby food) represents the crucial point to prevent fraudulent deceptions thus guaranteeing market transparency and consumers health. The most dangerous practice that jeopardies fish safety is intentional or unintentional mislabeling, originating from the substitution of valuable fish species with inferior ones. Conventional analytical methods for fish authentication are becoming inadequate to comply with the strict regulations issued by the European Union and with the increase of mislabeling due to the introduction on the market of new fish species and market globalization. This evidence prompts the development of high-throughput approaches suitable to identify unambiguous biomarkers of authenticity and screen a large number of samples with minimal time consumption. Proteomics provides suitable and powerful tools to investigate main aspects of food quality and safety and has given an important contribution in the field of biomarkers discovery applied to food authentication. This report describes the most relevant methods developed to assess fish identity and offers a perspective on their potential in the evaluation of fish quality and safety thus depicting the key role of proteomics in the authentication of fish species and processed products. The assessment of fishery products authenticity is a main issue in the control quality process as deceptive practices could imply severe health risks. Proteomics based methods could significantly contribute to detect falsification and frauds, thus becoming a reliable operative first-line testing resource in food authentication. Copyright © 2016 Elsevier B.V. All rights reserved.
Perazzolli, Michele
2012-01-01
Downy mildew is caused by the oomycete Plasmopara viticola and is one of the most serious diseases of grapevine. The beneficial microorganism Trichoderma harzianum T39 (T39) has previously been shown to induce plant-mediated resistance and to reduce the severity of downy mildew in susceptible grapevines. In order to better understand the cellular processes associated with T39-induced resistance, the proteomic and histochemical changes activated by T39 in grapevine were investigated before and 1 day after P. viticola inoculation. A comprehensive proteomic analysis of T39-induced resistance in grapevine was performed using an eight-plex iTRAQ protocol, resulting in the identification and quantification of a total of 800 proteins. Most of the proteins directly affected by T39 were found to be involved in signal transduction, indicating activation of a complete microbial recognition machinery. Moreover, T39-induced resistance was associated with rapid accumulation of reactive oxygen species and callose at infection sites, as well as changes in abundance of proteins involved in response to stress and redox balance, indicating an active defence response to downy mildew. On the other hand, proteins affected by P. viticola in control plants mainly decreased in abundance, possibly reflecting the establishment of a compatible interaction. Finally, the high-throughput iTRAQ protocol allowed de novo peptide sequencing, which will be used to improve annotation of the Vitis vinifera cv. Pinot Noir proteome. PMID:23105132
Epigenetics and Proteomics Join Transcriptomics in the Quest for Tuberculosis Biomarkers
Esterhuyse, Maria M.; Weiner, January; Caron, Etienne; Loxton, Andre G.; Iannaccone, Marco; Wagman, Chandre; Saikali, Philippe; Stanley, Kim; Wolski, Witold E.; Mollenkopf, Hans-Joachim; Schick, Matthias; Aebersold, Ruedi; Linhart, Heinz; Walzl, Gerhard
2015-01-01
ABSTRACT An estimated one-third of the world’s population is currently latently infected with Mycobacterium tuberculosis. Latent M. tuberculosis infection (LTBI) progresses into active tuberculosis (TB) disease in ~5 to 10% of infected individuals. Diagnostic and prognostic biomarkers to monitor disease progression are urgently needed to ensure better care for TB patients and to decrease the spread of TB. Biomarker development is primarily based on transcriptomics. Our understanding of biology combined with evolving technical advances in high-throughput techniques led us to investigate the possibility of additional platforms (epigenetics and proteomics) in the quest to (i) understand the biology of the TB host response and (ii) search for multiplatform biosignatures in TB. We engaged in a pilot study to interrogate the DNA methylome, transcriptome, and proteome in selected monocytes and granulocytes from TB patients and healthy LTBI participants. Our study provides first insights into the levels and sources of diversity in the epigenome and proteome among TB patients and LTBI controls, despite limitations due to small sample size. Functionally the differences between the infection phenotypes (LTBI versus active TB) observed in the different platforms were congruent, thereby suggesting regulation of function not only at the transcriptional level but also by DNA methylation and microRNA. Thus, our data argue for the development of a large-scale study of the DNA methylome, with particular attention to study design in accounting for variation based on gender, age, and cell type. PMID:26374119
Identification of lactoferricin B intracellular targets using an Escherichia coli proteome chip.
Tu, Yu-Hsuan; Ho, Yu-Hsuan; Chuang, Ying-Chih; Chen, Po-Chung; Chen, Chien-Sheng
2011-01-01
Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach.
Identification of Lactoferricin B Intracellular Targets Using an Escherichia coli Proteome Chip
Chen, Po-Chung; Chen, Chien-Sheng
2011-01-01
Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach. PMID:22164243
Chaudhury, Arun
2015-01-01
Using 2D differential gel electrophoresis (DIGE) and mass spectrometry (MS), a recent report by Rattan and Ali (2015) compared proteome expression between tonically contracted sphincteric smooth muscles of the internal anal sphincter (IAS), in comparison to the adjacent rectum [rectal smooth muscles (RSM)] that contracts in a phasic fashion. The study showed the differential expression of a single 23 kDa protein SM22, which was 1.87 fold, overexpressed in RSM in comparison to IAS. Earlier studies have shown differences in expression of different proteins like Rho-associated protein kinase II, myosin light chain kinase, myosin phosphatase, and protein kinase C between IAS and RSM. The currently employed methods, despite its high-throughput potential, failed to identify these well-characterized differences between phasic and tonic muscles. This calls into question the fidelity and validatory potential of the otherwise powerful technology of 2D DIGE/MS. These discrepancies, when redressed in future studies, will evolve this recent report as an important baseline study of "sphincter proteome." Proteomics techniques are currently underutilized in examining pathophysiology of hypertensive/hypotensive disorders involving gastrointestinal sphincters, including achalasia, gastroesophageal reflux disease (GERD), spastic pylorus, seen during diabetes or chronic chemotherapy, intestinal pseudo-obstruction, and recto-anal incontinence. Global proteome mapping may provide instant snapshot of the complete repertoire of differential proteins, thus expediting to identify the molecular pathology of gastrointestinal motility disorders currently labeled "idiopathic" and facilitating practice of precision medicine.
Universal Solid-phase Reversible Sample-Prep for Concurrent Proteome and N-glycome Characterization
Zhou, Hui; Morley, Samantha; Kostel, Stephen; Freeman, Michael R.; Joshi, Vivek; Brewster, David; Lee, Richard S.
2017-01-01
SUMMARY We describe a novel Solid-phase Reversible Sample-Prep (SRS) platform, which enables rapid sample preparation for concurrent proteome and N-glycome characterization by mass spectrometry. SRS utilizes a uniquely functionalized, silica-based bead that has strong affinity toward proteins with minimal-to-no affinity for peptides and other small molecules. By leveraging the inherent size difference between, SRS permits high-capacity binding of proteins, rapid removal of small molecules (detergents, metabolites, salts, etc.), extensive manipulation including enzymatic and chemical treatments on beads-bound proteins, and easy recovery of N-glycans and peptides. The efficacy of SRS was evaluated in a wide range of biological samples including single glycoprotein, whole cell lysate, murine tissues, and human urine. To further demonstrate the SRS platform, we coupled a quantitative strategy to SRS to investigate the differences between DU145 prostate cancer cells and its DIAPH3-silenced counterpart. Our previous studies suggested that DIAPH3 silencing in DU145 prostate cancer cells induced transition to an amoeboid phenotype that correlated with tumor progression and metastasis. In this analysis we identified distinct proteomic and N-glycomic alterations between the two cells. Intriguingly, a metastasis-associated tyrosine kinase receptor ephrin-type-A receptor (EPHA2) was highly upregulated in DIAPH3-silenced cells, indicating underling connection between EPHA2 and DIAPH3. Moreover, distinct alterations in the N-glycome were identified, suggesting a cross-link between DIAPH3 and glycosyltransferase networks. Overall, SRS is an enabling universal sample preparation strategy that is not size limited and has the capability to efficiently prepare and clean peptides and N-glycans concurrently from nearly all sample types. Conceptually, SRS can be utilized for the analysis of other posttranslational modifications, and the unique surface chemistry can be further transformed for high-throughput automation. The technical simplicity, robustness, and modularity of SRS make it a highly promising technology with great potential in proteomic-based research. PMID:26791391
Kim, Kee-Hong; Brown, Kimberly M; Harris, Paul V; Langston, James A; Cherry, Joel R
2007-12-01
Economically competitive production of ethanol from lignocellulosic biomass by enzymatic hydrolysis and fermentation is currently limited, in part, by the relatively high cost and low efficiency of the enzymes required to hydrolyze cellulose to fermentable sugars. Discovery of novel cellulases with greater activity could be a critical step in overcoming this cost barrier. beta-Glucosidase catalyzes the final step in conversion of glucose polymers to glucose. Despite the importance, only a few beta-glucosidases are commercially available, and more efficient ones are clearly needed. We developed a proteomics strategy aiming to discover beta-glucosidases present in the secreted proteome of the cellulose-degrading fungus Aspergillus fumigatus. With the use of partial or complete protein denaturing conditions, the secretory proteome was fractionated in a 2DGE format and beta-glucosidase activity was detected in the gel after infusion with a substrate analogue that fluoresces upon hydrolysis. Fluorescing spots were subjected to tryptic-digestion, and identification as beta-glucosidases was confirmed by tandem mass spectrometry. Two novel beta-glucosidases of A. fumigatus were identified by this in situ activity staining method, and the gene coding for a novel beta-glucosidase ( EAL88289 ) was cloned and heterologously expressed. The expressed beta-glucosidase showed far superior heat stability to the previously characterized beta-glucosidases of Aspergillus niger and Aspergillus oryzae. Improved heat stability is important for development of the next generation of saccharifying enzymes capable of performing fast cellulose hydrolysis reactions at elevated temperatures, thereby lowering the cost of bioethanol production. The in situ activity staining approach described here would be a useful tool for cataloguing and assessing the efficiency of beta-glucosidases in a high throughput fashion.
Kim, Jong-Seo; Fillmore, Thomas L; Liu, Tao; Robinson, Errol; Hossain, Mahmud; Champion, Boyd L; Moore, Ronald J; Camp, David G; Smith, Richard D; Qian, Wei-Jun
2011-12-01
Selected reaction monitoring (SRM)-MS is an emerging technology for high throughput targeted protein quantification and verification in biomarker discovery studies; however, the cost associated with the application of stable isotope-labeled synthetic peptides as internal standards can be prohibitive for screening a large number of candidate proteins as often required in the preverification phase of discovery studies. Herein we present a proof of concept study using an (18)O-labeled proteome reference as global internal standards (GIS) for SRM-based relative quantification. The (18)O-labeled proteome reference (or GIS) can be readily prepared and contains a heavy isotope ((18)O)-labeled internal standard for every possible tryptic peptide. Our results showed that the percentage of heavy isotope ((18)O) incorporation applying an improved protocol was >99.5% for most peptides investigated. The accuracy, reproducibility, and linear dynamic range of quantification were further assessed based on known ratios of standard proteins spiked into the labeled mouse plasma reference. Reliable quantification was observed with high reproducibility (i.e. coefficient of variance <10%) for analyte concentrations that were set at 100-fold higher or lower than those of the GIS based on the light ((16)O)/heavy ((18)O) peak area ratios. The utility of (18)O-labeled GIS was further illustrated by accurate relative quantification of 45 major human plasma proteins. Moreover, quantification of the concentrations of C-reactive protein and prostate-specific antigen was illustrated by coupling the GIS with standard additions of purified protein standards. Collectively, our results demonstrated that the use of (18)O-labeled proteome reference as GIS provides a convenient, low cost, and effective strategy for relative quantification of a large number of candidate proteins in biological or clinical samples using SRM.
Andromeda: a peptide search engine integrated into the MaxQuant environment.
Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias
2011-04-01
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.
Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencing
2011-01-01
Background A long term research goal of venomics, of applied importance for improving current antivenom therapy, but also for drug discovery, is to understand the pharmacological potential of venoms. Individually or combined, proteomic and transcriptomic studies have demonstrated their feasibility to explore in depth the molecular diversity of venoms. In the absence of genome sequence, transcriptomes represent also valuable searchable databases for proteomic projects. Results The venom gland transcriptomes of 8 Costa Rican taxa from 5 genera (Crotalus, Bothrops, Atropoides, Cerrophidion, and Bothriechis) of pitvipers were investigated using high-throughput 454 pyrosequencing. 100,394 out of 330,010 masked reads produced significant hits in the available databases. 5.165,220 nucleotides (8.27%) were masked by RepeatMasker, the vast majority of which corresponding to class I (retroelements) and class II (DNA transposons) mobile elements. BLAST hits included 79,991 matches to entries of the taxonomic suborder Serpentes, of which 62,433 displayed similarity to documented venom proteins. Strong discrepancies between the transcriptome-computed and the proteome-gathered toxin compositions were obvious at first sight. Although the reasons underlaying this discrepancy are elusive, since no clear trend within or between species is apparent, the data indicate that individual mRNA species may be translationally controlled in a species-dependent manner. The minimum number of genes from each toxin family transcribed into the venom gland transcriptome of each species was calculated from multiple alignments of reads matched to a full-length reference sequence of each toxin family. Reads encoding ORF regions of Kazal-type inhibitor-like proteins were uniquely found in Bothriechis schlegelii and B. lateralis transcriptomes, suggesting a genus-specific recruitment event during the early-Middle Miocene. A transcriptome-based cladogram supports the large divergence between A. mexicanus and A. picadoi, and a closer kinship between A. mexicanus and C. godmani. Conclusions Our comparative next-generation sequencing (NGS) analysis reveals taxon-specific trends governing the formulation of the venom arsenal. Knowledge of the venom proteome provides hints on the translation efficiency of toxin-coding transcripts, contributing thereby to a more accurate interpretation of the transcriptome. The application of NGS to the analysis of snake venom transcriptomes, may represent the tool for opening the door to systems venomics. PMID:21605378
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
On the Achievable Throughput Over TVWS Sensor Networks
Caleffi, Marcello; Cacciapuoti, Angela Sara
2016-01-01
In this letter, we study the throughput achievable by an unlicensed sensor network operating over TV white space spectrum in presence of coexistence interference. Through the letter, we first analytically derive the achievable throughput as a function of the channel ordering. Then, we show that the problem of deriving the maximum expected throughput through exhaustive search is computationally unfeasible. Finally, we derive a computational-efficient algorithm characterized by polynomial-time complexity to compute the channel set maximizing the expected throughput and, stemming from this, we derive a closed-form expression of the maximum expected throughput. Numerical simulations validate the theoretical analysis. PMID:27043565
Although two-dimensional electrophoresis (2D-GE) remains the basis for many ecotoxicoproteomic analyses, new, non gel-based methods are beginning to be applied to overcome throughput and coverage limitations of 2D-GE. The overall objective of our research was to apply a comprehe...
A homology-based pipeline for global prediction of post-translational modification sites
NASA Astrophysics Data System (ADS)
Chen, Xiang; Shi, Shao-Ping; Xu, Hao-Dong; Suo, Sheng-Bao; Qiu, Jian-Ding
2016-05-01
The pathways of protein post-translational modifications (PTMs) have been shown to play particularly important roles for almost any biological process. Identification of PTM substrates along with information on the exact sites is fundamental for fully understanding or controlling biological processes. Alternative computational strategies would help to annotate PTMs in a high-throughput manner. Traditional algorithms are suited for identifying the common organisms and tissues that have a complete PTM atlas or extensive experimental data. While annotation of rare PTMs in most organisms is a clear challenge. In this work, to this end we have developed a novel homology-based pipeline named PTMProber that allows identification of potential modification sites for most of the proteomes lacking PTMs data. Cross-promotion E-value (CPE) as stringent benchmark has been used in our pipeline to evaluate homology to known modification sites. Independent-validation tests show that PTMProber achieves over 58.8% recall with high precision by CPE benchmark. Comparisons with other machine-learning tools show that PTMProber pipeline performs better on general predictions. In addition, we developed a web-based tool to integrate this pipeline at http://bioinfo.ncu.edu.cn/PTMProber/index.aspx. In addition to pre-constructed prediction models of PTM, the website provides an extensional functionality to allow users to customize models.
Image Harvest: an open-source platform for high-throughput plant image processing and analysis.
Knecht, Avi C; Campbell, Malachy T; Caprez, Adam; Swanson, David R; Walia, Harkamal
2016-05-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzke, Melissa M.; Brown, Joseph N.; Gritsenko, Marina A.
2013-02-01
Liquid chromatography coupled with mass spectrometry (LC-MS) is widely used to identify and quantify peptides in complex biological samples. In particular, label-free shotgun proteomics is highly effective for the identification of peptides and subsequently obtaining a global protein profile of a sample. As a result, this approach is widely used for discovery studies. Typically, the objective of these discovery studies is to identify proteins that are affected by some condition of interest (e.g. disease, exposure). However, for complex biological samples, label-free LC-MS proteomics experiments measure peptides and do not directly yield protein quantities. Thus, protein quantification must be inferred frommore » one or more measured peptides. In recent years, many computational approaches to relative protein quantification of label-free LC-MS data have been published. In this review, we examine the most commonly employed quantification approaches to relative protein abundance from peak intensity values, evaluate their individual merits, and discuss challenges in the use of the various computational approaches.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Orwoll, Eric S.; Wiedrick, Jack; Jacobs, Jon
The biological perturbations associated with incident mortality are not well elucidated, and there are limited biomarkers for the prediction of mortality. We used a novel high throughput proteomics approach to identify serum peptides and proteins associated with 5 year mortality in community dwelling men age >65 years who participated in a longitudinal observational study of musculoskeletal aging (Osteoporotic Fractures in Men: MrOS). In a discovery phase, serum specimens collected at baseline in 2473 men were analyzed using liquid chromatography-ion mobility-mass spectrometry, and incident mortality in the subsequent 5 years was ascertained by tri-annual questionnaire. Rigorous statistical methods were utilized tomore » identify 56 peptides (31 proteins) that were associated with 5-year mortality. In an independent replication phase, selected reaction monitoring was used to examine 21 of those peptides in baseline serum from 750 additional men; 81% of those peptides remained significantly associated with mortality. Mortality-associated proteins included a variety involved in inflammation or complement activation; several have been previously linked to mortality (e.g. C reactive protein, alpha 1-antichymotrypsin) and others are not previously known to be associated with mortality. Other novel proteins of interest included pregnancy-associated plasma protein, VE cadherin, leucine-rich α-2 glycoprotein 1, vinculin, vitronectin, mast/stem cell growth factor receptor and Saa4. A panel of peptides improved the predictive value of a commonly used clinical predictor of mortality. Overall, these results suggest that complex inflammatory pathways, and proteins in other pathways, are linked to 5-year mortality risk. This work may serve to identify novel biomarkers for near term mortality.« less
Effect of posttranslational modifications on enzyme function and assembly.
Ryšlavá, Helena; Doubnerová, Veronika; Kavan, Daniel; Vaněk, Ondřej
2013-10-30
The detailed examination of enzyme molecules by mass spectrometry and other techniques continues to identify hundreds of distinct PTMs. Recently, global analyses of enzymes using methods of contemporary proteomics revealed widespread distribution of PTMs on many key enzymes distributed in all cellular compartments. Critically, patterns of multiple enzymatic and nonenzymatic PTMs within a single enzyme are now functionally evaluated providing a holistic picture of a macromolecule interacting with low molecular mass compounds, some of them being substrates, enzyme regulators, or activated precursors for enzymatic and nonenzymatic PTMs. Multiple PTMs within a single enzyme molecule and their mutual interplays are critical for the regulation of catalytic activity. Full understanding of this regulation will require detailed structural investigation of enzymes, their structural analogs, and their complexes. Further, proteomics is now integrated with molecular genetics, transcriptomics, and other areas leading to systems biology strategies. These allow the functional interrogation of complex enzymatic networks in their natural environment. In the future, one might envisage the use of robust high throughput analytical techniques that will be able to detect multiple PTMs on a global scale of individual proteomes from a number of carefully selected cells and cellular compartments. This article is part of a Special Issue entitled: Posttranslational Protein modifications in biology and Medicine. Copyright © 2013 Elsevier B.V. All rights reserved.
Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R; Hubbard, Simon J
2011-04-01
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.
Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R.; Hubbard, Simon J.
2013-01-01
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the post-processing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server, and a downloadable application, which makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download. PMID:21222473
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Hong; Yang, Yanling; Li, Yuxin
2015-02-06
Development of high resolution liquid chromatography (LC) is essential for improving the sensitivity and throughput of mass spectrometry (MS)-based proteomics. Here we present systematic optimization of a long gradient LC-MS/MS platform to enhance protein identification from a complex mixture. The platform employed an in-house fabricated, reverse phase column (100 μm x 150 cm) coupled with Q Exactive MS. The column was capable of achieving a peak capacity of approximately 700 in a 720 min gradient of 10-45% acetonitrile. The optimal loading level was about 6 micrograms of peptides, although the column allowed loading as many as 20 micrograms. Gas phasemore » fractionation of peptide ions further increased the number of peptide identification by ~10%. Moreover, the combination of basic pH LC pre-fractionation with the long gradient LC-MS/MS platform enabled the identification of 96,127 peptides and 10,544 proteins at 1% protein false discovery rate in a postmortem brain sample of Alzheimer’s disease. As deep RNA sequencing of the same specimen suggested that ~16,000 genes were expressed, current analysis covered more than 60% of the expressed proteome. Further improvement strategies of the LC/LC-MS/MS platform were also discussed.« less
The Urine Proteome as a Biomarker of Radiation Injury
Sharma, Mukut; Halligan, Brian D.; Wakim, Bassam T.; Savin, Virginia J.; Cohen, Eric P.; Moulder, John E.
2009-01-01
Terrorist attacks or nuclear accidents could expose large numbers of people to ionizing radiation, and early biomarkers of radiation injury would be critical for triage, treatment and follow-up of such individuals. However, no such biomarkers have yet been proven to exist. We tested the potential of high throughput proteomics to identify protein biomarkers of radiation injury after total body X-ray irradiation in a rat model. Subtle functional changes in the kidney are suggested by an increased glomerular permeability for macromolecules measured within 24 hours after TBI. Ultrastructural changes in glomerular podocytes include partial loss of the interdigitating organization of foot processes. Analysis of urine by LC-MS/MS and 2D-GE showed significant changes in the urine proteome within 24 hours after TBI. Tissue kallikrein 1-related peptidase, cysteine proteinase inhibitor cystatin C and oxidized histidine were found to be increased while a number of proteinase inhibitors including kallikrein-binding protein and albumin were found to be decreased post-irradiation. Thus, TBI causes immediately detectable changes in renal structure and function and in the urinary protein profile. This suggests that both systemic and renal changes are induced by radiation and it may be possible to identify a set of biomarkers unique to radiation injury. PMID:19746194
Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan
2013-01-01
Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388
NASA Astrophysics Data System (ADS)
Patton, Wayne F.; Berggren, Kiera N.; Lopez, Mary F.
2001-04-01
Facilities engaged in proteome analysis differ significantly in the degree that they implement automated systems for high-throughput protein characterization. Though automated workstation environments are becoming more routine in the biotechnology and pharmaceutical sectors of industry, university-based laboratories often perform these tasks manually, submitting protein spots excised from polyacrylamide gels to institutional core facilities for identification. For broad compatibility with imaging platforms, an optimized fluorescent dye developed for proteomics applications should be designed taking into account that laser scanners use visible light excitation and that charge-coupled device camera systems and gas discharge transilluminators rely upon UV excitation. The luminescent ruthenium metal complex, SYPRO Ruby protein gel stain, is compatible with a variety of excitation sources since it displays intense UV (280 nm) and visible (470 nm) absorption maxima. Localization is achieved by noncovalent, electrostatic and hydrophobic binding of dye to proteins, with signal being detected at 610 nm. Since proteins are not covalently modified by the dye, compatibility with downstream microchemical characterization techniques such as matrix-assisted laser desorption/ionization-mass spectrometry is assured. Protocols have been devised for optimizing fluorophore intensity. SYPRO Ruby dye outperforms alternatives such as silver staining in terms of quantitative capabilities, compatibility with mass spectrometry and ease of integration into automated work environments.
TCP Throughput Profiles Using Measurements over Dedicated Connections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata
Wide-area data transfers in high-performance computing infrastructures are increasingly being carried over dynamically provisioned dedicated network connections that provide high capacities with no competing traffic. We present extensive TCP throughput measurements and time traces over a suite of physical and emulated 10 Gbps connections with 0-366 ms round-trip times (RTTs). Contrary to the general expectation, they show significant statistical and temporal variations, in addition to the overall dependencies on the congestion control mechanism, buffer size, and the number of parallel streams. We analyze several throughput profiles that have highly desirable concave regions wherein the throughput decreases slowly with RTTs, inmore » stark contrast to the convex profiles predicted by various TCP analytical models. We present a generic throughput model that abstracts the ramp-up and sustainment phases of TCP flows, which provides insights into qualitative trends observed in measurements across TCP variants: (i) slow-start followed by well-sustained throughput leads to concave regions; (ii) large buffers and multiple parallel streams expand the concave regions in addition to improving the throughput; and (iii) stable throughput dynamics, indicated by a smoother Poincare map and smaller Lyapunov exponents, lead to wider concave regions. These measurements and analytical results together enable us to select a TCP variant and its parameters for a given connection to achieve high throughput with statistical guarantees.« less
NASA Astrophysics Data System (ADS)
Kudoh, Eisuke; Ito, Haruki; Wang, Zhisen; Adachi, Fumiyuki
In mobile communication systems, high speed packet data services are demanded. In the high speed data transmission, throughput degrades severely due to severe inter-path interference (IPI). Recently, we proposed a random transmit power control (TPC) to increase the uplink throughput of DS-CDMA packet mobile communications. In this paper, we apply IPI cancellation in addition to the random TPC. We derive the numerical expression of the received signal-to-interference plus noise power ratio (SINR) and introduce IPI cancellation factor. We also derive the numerical expression of system throughput when IPI is cancelled ideally to compare with the Monte Carlo numerically evaluated system throughput. Then we evaluate, by Monte-Carlo numerical computation method, the combined effect of random TPC and IPI cancellation on the uplink throughput of DS-CDMA packet mobile communications.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.