Sample records for proteomics identifications database

  1. Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

    PubMed

    Schaab, Christoph; Geiger, Tamar; Stoehr, Gabriele; Cox, Juergen; Mann, Matthias

    2012-03-01

    MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.

  2. Proteomics: Protein Identification Using Online Databases

    ERIC Educational Resources Information Center

    Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

    2012-01-01

    Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…

  3. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

    PubMed

    Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

    2017-01-01

    Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

  4. STEPS: a grid search methodology for optimized peptide identification filtering of MS/MS database search results.

    PubMed

    Piehowski, Paul D; Petyuk, Vladislav A; Sandoval, John D; Burnum, Kristin E; Kiebel, Gary R; Monroe, Matthew E; Anderson, Gordon A; Camp, David G; Smith, Richard D

    2013-03-01

    For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Using the Proteomics Identifications Database (PRIDE).

    PubMed

    Martens, Lennart; Jones, Phil; Côté, Richard

    2008-03-01

    The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE.

  6. Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.

    PubMed

    Welker, F

    2018-02-20

    The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.

  7. Systematic Errors in Peptide and Protein Identification and Quantification by Modified Peptides*

    PubMed Central

    Bogdanow, Boris; Zauber, Henrik; Selbach, Matthias

    2016-01-01

    The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. The quality of peptide and protein identification and quantification critically depends on the sensitivity and specificity of this assignment process. Many peptides in proteomic samples carry biochemical modifications, and a large fraction of unassigned spectra arise from modified peptides. Spectra derived from modified peptides can erroneously be assigned to wrong amino acid sequences. However, the impact of this problem on proteomic data has not yet been investigated systematically. Here we use combinations of different database searches to show that modified peptides can be responsible for 20–50% of false positive identifications in deep proteomic data sets. These false positive hits are particularly problematic as they have significantly higher scores and higher intensities than other false positive matches. Furthermore, these wrong peptide assignments lead to hundreds of false protein identifications and systematic biases in protein quantification. We devise a “cleaned search” strategy to address this problem and show that this considerably improves the sensitivity and specificity of proteomic data. In summary, we show that modified peptides cause systematic errors in peptide and protein identification and quantification and should therefore be considered to further improve the quality of proteomic data annotation. PMID:27215553

  8. YPED: An Integrated Bioinformatics Suite and Database for Mass Spectrometry-based Proteomics Research

    PubMed Central

    Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.

    2015-01-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262

  9. YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research.

    PubMed

    Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R

    2015-02-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  10. Computer applications making rapid advances in high throughput microbial proteomics (HTMP).

    PubMed

    Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen

    2014-02-01

    The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.

  11. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  12. Sys-BodyFluid: a systematical database for human body fluid proteome research

    PubMed Central

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10 000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/. PMID:18978022

  13. Sys-BodyFluid: a systematical database for human body fluid proteome research.

    PubMed

    Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

    2009-01-01

    Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10,000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/.

  14. Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

    PubMed

    Park, Gun Wook; Hwang, Heeyoun; Kim, Kwang Hoe; Lee, Ju Yeon; Lee, Hyun Kyoung; Park, Ji Yeong; Ji, Eun Sun; Park, Sung-Kyu Robin; Yates, John R; Kwon, Kyung-Hoon; Park, Young Mok; Lee, Hyoung-Joo; Paik, Young-Ki; Kim, Jin Young; Yoo, Jong Shin

    2016-11-04

    In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).

  15. Introducing the CPL/MUW proteome database: interpretation of human liver and liver cancer proteome profiles by referring to isolated primary cells.

    PubMed

    Wimmer, Helge; Gundacker, Nina C; Griss, Johannes; Haudek, Verena J; Stättner, Stefan; Mohr, Thomas; Zwickl, Hannes; Paulitschke, Verena; Baron, David M; Trittner, Wolfgang; Kubicek, Markus; Bayer, Editha; Slany, Astrid; Gerner, Christopher

    2009-06-01

    Interpretation of proteome data with a focus on biomarker discovery largely relies on comparative proteome analyses. Here, we introduce a database-assisted interpretation strategy based on proteome profiles of primary cells. Both 2-D-PAGE and shotgun proteomics are applied. We obtain high data concordance with these two different techniques. When applying mass analysis of tryptic spot digests from 2-D gels of cytoplasmic fractions, we typically identify several hundred proteins. Using the same protein fractions, we usually identify more than thousand proteins by shotgun proteomics. The data consistency obtained when comparing these independent data sets exceeds 99% of the proteins identified in the 2-D gels. Many characteristic differences in protein expression of different cells can thus be independently confirmed. Our self-designed SQL database (CPL/MUW - database of the Clinical Proteomics Laboratories at the Medical University of Vienna accessible via www.meduniwien.ac.at/proteomics/database) facilitates (i) quality management of protein identification data, which are based on MS, (ii) the detection of cell type-specific proteins and (iii) of molecular signatures of specific functional cell states. Here, we demonstrate, how the interpretation of proteome profiles obtained from human liver tissue and hepatocellular carcinoma tissue is assisted by the Clinical Proteomics Laboratories at the Medical University of Vienna-database. Therefore, we suggest that the use of reference experiments supported by a tailored database may substantially facilitate data interpretation of proteome profiling experiments.

  16. Proteomic platform for the identification of proteins in olive (Olea europaea) pulp.

    PubMed

    Capriotti, Anna Laura; Cavaliere, Chiara; Foglia, Patrizia; Piovesana, Susy; Samperi, Roberto; Stampachiacchiere, Serena; Laganà, Aldo

    2013-10-24

    The nutritional and cancer-protective properties of the oil extracted mechanically from the ripe fruits of Olea europaea trees are attracting constantly more attention worldwide. The preparation of high-quality protein samples from plant tissues for proteomic analysis poses many challenging problems. In this study we employed a proteomic platform based on two different extraction methods, SDS and CHAPS based protocols, followed by two precipitation protocols, TCA/acetone and MeOH precipitation, in order to increase the final number of identified proteins. The use of advanced MS techniques in combination with the Swissprot and NCBI Viridiplantae databases and TAIR10 Arabidopsis database allowed us to identify 1265 proteins, of which 22 belong to O. europaea. The application of this proteomic platform for protein extraction and identification will be useful also for other proteomic studies on recalcitrant plant/fruit tissues. Copyright © 2013. Published by Elsevier B.V.

  17. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    PubMed

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Denef, Vincent; Shah, Manesh B; Verberkmoes, Nathan C

    The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effectsmore » of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less

  19. 2016 update of the PRIDE database and its related tools

    PubMed Central

    Vizcaíno, Juan Antonio; Csordas, Attila; del-Toro, Noemi; Dianes, José A.; Griss, Johannes; Lavidas, Ilias; Mayer, Gerhard; Perez-Riverol, Yasset; Reisinger, Florian; Ternent, Tobias; Xu, Qing-Wei; Wang, Rui; Hermjakob, Henning

    2016-01-01

    The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool. Finally, we will give a brief update on the resources under development ‘PRIDE Cluster’ and ‘PRIDE Proteomes’, which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive. PMID:26527722

  20. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes

    PubMed Central

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V.; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J.; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wiśniewski, Jacek R.; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools. PMID:17090601

  1. iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates*

    PubMed Central

    Shteynberg, David; Deutsch, Eric W.; Lam, Henry; Eng, Jimmy K.; Sun, Zhi; Tasman, Natalie; Mendoza, Luis; Moritz, Robert L.; Aebersold, Ruedi; Nesvizhskii, Alexey I.

    2011-01-01

    The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets. PMID:21876204

  2. P19-S Managing Proteomics Data from Data Generation and Data Warehousing to Central Data Repository and Journal Reviewing Processes

    PubMed Central

    Thiele, H.; Glandorf, J.; Koerting, G.; Reidegeld, K.; Blüggel, M.; Meyer, H.; Stephan, C.

    2007-01-01

    In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE. ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list. The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format. The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.

  3. Cell death proteomics database: consolidating proteomics data on cell death.

    PubMed

    Arntzen, Magnus Ø; Bull, Vibeke H; Thiede, Bernd

    2013-05-03

    Programmed cell death is a ubiquitous process of utmost importance for the development and maintenance of multicellular organisms. More than 10 different types of programmed cell death forms have been discovered. Several proteomics analyses have been performed to gain insight in proteins involved in the different forms of programmed cell death. To consolidate these studies, we have developed the cell death proteomics (CDP) database, which comprehends data from apoptosis, autophagy, cytotoxic granule-mediated cell death, excitotoxicity, mitotic catastrophe, paraptosis, pyroptosis, and Wallerian degeneration. The CDP database is available as a web-based database to compare protein identifications and quantitative information across different experimental setups. The proteomics data of 73 publications were integrated and unified with protein annotations from UniProt-KB and gene ontology (GO). Currently, more than 6,500 records of more than 3,700 proteins are included in the CDP. Comparing apoptosis and autophagy using overrepresentation analysis of GO terms, the majority of enriched processes were found in both, but also some clear differences were perceived. Furthermore, the analysis revealed differences and similarities of the proteome between autophagosomal and overall autophagy. The CDP database represents a useful tool to consolidate data from proteome analyses of programmed cell death and is available at http://celldeathproteomics.uio.no.

  4. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes.

    PubMed

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wisniewski, Jacek R; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at http://www.mapuproteome.com using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.

  5. P2P proteomics -- data sharing for enhanced protein identification

    PubMed Central

    2012-01-01

    Background In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers. Methods The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities. Results and Discussion We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories. PMID:22293032

  6. Current algorithmic solutions for peptide-based proteomics data generation and identification.

    PubMed

    Hoopmann, Michael R; Moritz, Robert L

    2013-02-01

    Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    PubMed

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-06-23

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals. Copyright © 2017. Published by Elsevier B.V.

  8. Proteomics data repositories: Providing a safe haven for your data and acting as a springboard for further research

    PubMed Central

    Vizcaíno, Juan Antonio; Foster, Joseph M.; Martens, Lennart

    2010-01-01

    Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, several mass spectrometry (MS) based proteomics repositories are publicly available for the scientific community. The main existing resources are: the Global Proteome Machine Database (GPMDB), PeptideAtlas, the PRoteomics IDEntifications database (PRIDE), Tranche, and NCBI Peptidome. In this review the capabilities of each of these will be described, paying special attention to four key properties: data types stored, applicable data submission strategies, supported formats, and available data mining and visualization tools. Additionally, the data contents from model organisms will be enumerated for each resource. There are other valuable smaller and/or more specialized repositories but they will not be covered in this review. Finally, the concept behind the ProteomeXchange consortium, a collaborative effort among the main resources in the field, will be introduced. PMID:20615486

  9. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    PubMed

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Separomics applied to the proteomics and peptidomics of low-abundance proteins: Choice of methods and challenges - A review.

    PubMed

    Baracat-Pereira, Maria Cristina; de Oliveira Barbosa, Meire; Magalhães, Marcos Jorge; Carrijo, Lanna Clicia; Games, Patrícia Dias; Almeida, Hebréia Oliveira; Sena Netto, José Fabiano; Pereira, Matheus Rodrigues; de Barros, Everaldo Gonçalves

    2012-06-01

    The enrichment and isolation of proteins are considered limiting steps in proteomic studies. Identification of proteins whose expression is transient, those that are of low-abundance, and of natural peptides not described in databases, is still a great challenge. Plant extracts are in general complex, and contaminants interfere with the identification of proteins involved in important physiological processes, such as plant defense against pathogens. This review discusses the challenges and strategies of separomics applied to the identification of low-abundance proteins and peptides in plants, especially in plants challenged by pathogens. Separomics is described as a group of methodological strategies for the separation of protein molecules for proteomics. Several tools have been used to remove highly abundant proteins from samples and also non-protein contaminants. The use of chromatographic techniques, the partition of the proteome into subproteomes, and an effort to isolate proteins in their native form have allowed the isolation and identification of rare proteins involved in different processes.

  11. Separomics applied to the proteomics and peptidomics of low-abundance proteins: Choice of methods and challenges – A review

    PubMed Central

    Baracat-Pereira, Maria Cristina; de Oliveira Barbosa, Meire; Magalhães, Marcos Jorge; Carrijo, Lanna Clicia; Games, Patrícia Dias; Almeida, Hebréia Oliveira; Sena Netto, José Fabiano; Pereira, Matheus Rodrigues; de Barros, Everaldo Gonçalves

    2012-01-01

    The enrichment and isolation of proteins are considered limiting steps in proteomic studies. Identification of proteins whose expression is transient, those that are of low-abundance, and of natural peptides not described in databases, is still a great challenge. Plant extracts are in general complex, and contaminants interfere with the identification of proteins involved in important physiological processes, such as plant defense against pathogens. This review discusses the challenges and strategies of separomics applied to the identification of low-abundance proteins and peptides in plants, especially in plants challenged by pathogens. Separomics is described as a group of methodological strategies for the separation of protein molecules for proteomics. Several tools have been used to remove highly abundant proteins from samples and also non-protein contaminants. The use of chromatographic techniques, the partition of the proteome into subproteomes, and an effort to isolate proteins in their native form have allowed the isolation and identification of rare proteins involved in different processes. PMID:22802713

  12. Mass spectrometry and animal science: protein identification strategies and particularities of farm animal species.

    PubMed

    Soares, Renata; Franco, Catarina; Pires, Elisabete; Ventosa, Miguel; Palhinhas, Rui; Koci, Kamila; Martinho de Almeida, André; Varela Coelho, Ana

    2012-07-19

    Proteomic approaches are gaining increasing importance in the context of all fields of animal and veterinary sciences, including physiology, productive characterization, and disease/parasite tolerance, among others. Proteomic studies mainly aim the proteome characterization of a certain organ, tissue, cell type or organism, either in a specific condition or comparing protein differential expression within two or more selected situations. Due to the high complexity of samples, usually total protein extracts, proteomics relies heavily on separation procedures, being 2D-electrophoresis and HPLC the most common, as well as on protein identification using mass spectrometry (MS) based methodologies. Despite the increasing importance of MS in the context of animal and veterinary science studies, the usefulness of such tools is still poorly perceived by the animal science community. This is primarily due to the limited knowledge on mass spectrometry by animal scientists. Additionally, confidence and success in protein identification is hindered by the lack of information in public databases for most of farm animal species and their pathogens, with the exception of cattle (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus). In this article, we will briefly summarize the main methodologies available for protein identification using mass spectrometry providing a case study of specific applications in the field of animal science. We will also address the difficulties inherent to protein identification using MS, with particular reference to experiments using animal species poorly described in public databases. Additionally, we will suggest strategies to increase the rate of successful identifications when working with farm animal species. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. PROTICdb: a web-based application to store, track, query, and compare plant proteome data.

    PubMed

    Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann

    2005-05-01

    PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.

  14. Computer aided manual validation of mass spectrometry-based proteomic data.

    PubMed

    Curran, Timothy G; Bryson, Bryan D; Reigelhaupt, Michael; Johnson, Hannah; White, Forest M

    2013-06-15

    Advances in mass spectrometry-based proteomic technologies have increased the speed of analysis and the depth provided by a single analysis. Computational tools to evaluate the accuracy of peptide identifications from these high-throughput analyses have not kept pace with technological advances; currently the most common quality evaluation methods are based on statistical analysis of the likelihood of false positive identifications in large-scale data sets. While helpful, these calculations do not consider the accuracy of each identification, thus creating a precarious situation for biologists relying on the data to inform experimental design. Manual validation is the gold standard approach to confirm accuracy of database identifications, but is extremely time-intensive. To palliate the increasing time required to manually validate large proteomic datasets, we provide computer aided manual validation software (CAMV) to expedite the process. Relevant spectra are collected, catalogued, and pre-labeled, allowing users to efficiently judge the quality of each identification and summarize applicable quantitative information. CAMV significantly reduces the burden associated with manual validation and will hopefully encourage broader adoption of manual validation in mass spectrometry-based proteomics. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

  16. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics

    PubMed Central

    Bell, Alexander W.; Deutsch, Eric W.; Au, Catherine E.; Kearney, Robert E.; Beavis, Ron; Sechi, Salvatore; Nilsson, Tommy; Bergeron, John J.M.

    2009-01-01

    We carried out a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in LC-MS-based proteomics. We distributed a test sample consisting of an equimolar mix of 20 highly purified recombinant human proteins, to 27 laboratories for identification. Each protein contained one or more unique tryptic peptides of 1250 Da to also test for ion selection and sampling in the mass spectrometer. Of the 27 labs, initially only 7 labs reported all 20 proteins correctly, and only 1 lab reported all the tryptic peptides of 1250 Da. Nevertheless, a subsequent centralized analysis of the raw data revealed that all 20 proteins and most of the 1250 Da peptides had in fact been detected by all 27 labs. The centralized analysis allowed us to determine sources of problems encountered in the study, which include missed identifications (false negatives), environmental contamination, database matching, and curation of protein identifications. Improved search engines and databases are likely to increase the fidelity of mass spectrometry-based proteomics. PMID:19448641

  17. Optimizing Algorithm Choice for Metaproteomics: Comparing X!Tandem and Proteome Discoverer for Soil Proteomes

    NASA Astrophysics Data System (ADS)

    Diaz, K. S.; Kim, E. H.; Jones, R. M.; de Leon, K. C.; Woodcroft, B. J.; Tyson, G. W.; Rich, V. I.

    2014-12-01

    The growing field of metaproteomics links microbial communities to their expressed functions by using mass spectrometry methods to characterize community proteins. Comparison of mass spectrometry protein search algorithms and their biases is crucial for maximizing the quality and amount of protein identifications in mass spectral data. Available algorithms employ different approaches when mapping mass spectra to peptides against a database. We compared mass spectra from four microbial proteomes derived from high-organic content soils searched with two search algorithms: 1) Sequest HT as packaged within Proteome Discoverer (v.1.4) and 2) X!Tandem as packaged in TransProteomicPipeline (v.4.7.1). Searches used matched metagenomes, and results were filtered to allow identification of high probability proteins. There was little overlap in proteins identified by both algorithms, on average just ~24% of the total. However, when adjusted for spectral abundance, the overlap improved to ~70%. Proteome Discoverer generally outperformed X!Tandem, identifying an average of 12.5% more proteins than X!Tandem, with X!Tandem identifying more proteins only in the first two proteomes. For spectrally-adjusted results, the algorithms were similar, with X!Tandem marginally outperforming Proteome Discoverer by an average of ~4%. We then assessed differences in heat shock proteins (HSP) identification by the two algorithms by BLASTing identified proteins against the Heat Shock Protein Information Resource, because HSP hits typically account for the majority signal in proteomes, due to extraction protocols. Total HSP identifications for each of the 4 proteomes were approximately ~15%, ~11%, ~17%, and ~19%, with ~14% for total HSPs with redundancies removed. Of the ~15% average of proteins from the 4 proteomes identified as HSPs, ~10% of proteins and spectra were identified by both algorithms. On average, Proteome Discoverer identified ~9% more HSPs than X!Tandem.

  18. Morphinome Database - The database of proteins altered by morphine administration - An update.

    PubMed

    Bodzon-Kulakowska, Anna; Padrtova, Tereza; Drabik, Anna; Ner-Kluza, Joanna; Antolak, Anna; Kulakowski, Konrad; Suder, Piotr

    2018-04-13

    Morphine is considered a gold standard in pain treatment. Nevertheless, its use could be associated with severe side effects, including drug addiction. Thus, it is very important to understand the molecular mechanism of morphine action in order to develop new methods of pain therapy, or at least to attenuate the side effects of opioids usage. Proteomics allows for the indication of proteins involved in certain biological processes, but the number of items identified in a single study is usually overwhelming. Thus, researchers face the difficult problem of choosing the proteins which are really important for the investigated processes and worth further studies. Therefore, based on the 29 published articles, we created a database of proteins regulated by morphine administration - The Morphinome Database (addiction-proteomics.org). This web tool allows for indicating proteins that were identified during different proteomics studies. Moreover, the collection and organization of such a vast amount of data allows us to find the same proteins that were identified in various studies and to create their ranking, based on the frequency of their identification. STRING and KEGG databases indicated metabolic pathways which those molecules are involved in. This means that those molecular pathways seem to be strongly affected by morphine administration and could be important targets for further investigations. The data about proteins identified by different proteomics studies of molecular changes caused by morphine administration (29 published articles) were gathered in the Morphinome Database. Unification of those data allowed for the identification of proteins that were indicated several times by distinct proteomics studies, which means that they seem to be very well verified and important for the entire process. Those proteins might be now considered promising aims for more detailed studies of their role in the molecular mechanism of morphine action. Copyright © 2018. Published by Elsevier B.V.

  19. The Multinational Arabidopsis Steering Subcommittee for Proteomics Assembles the Largest Proteome Database Resource for Plant Systems Biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weckwerth, Wolfram; Baginsky, Sacha; Van Wijk, Klass

    2009-12-01

    In the past 10 years, we have witnessed remarkable advances in the field of plant molecular biology. The rapid development of proteomic technologies and the speed with which these techniques have been applied to the field have altered our perception of how we can analyze proteins in complex systems. At nearly the same time, the availability of the complete genome for the model plant Arabidopsis thaliana was released; this effort provides an unsurpassed resource for the identification of proteins when researchers use MS to analyze plant samples. Recognizing the growth in this area, the Multinational Arabidopsis Steering Committee (MASC) establishedmore » a subcommittee for A. thaliana proteomics in 2006 with the objective of consolidating databases, technique standards, and experimentally validated candidate genes and functions. Since the establishment of the Multinational Arabidopsis Steering Subcommittee for Proteomics (MASCP), many new approaches and resources have become available. Recently, the subcommittee established a webpage to consolidate this information (www.masc-proteomics.org). It includes links to plant proteomic databases, general information about proteomic techniques, meeting information, a summary of proteomic standards, and other relevant resources. Altogether, this website provides a useful resource for the Arabidopsis proteomics community. In the future, the website will host discussions and investigate the cross-linking of databases. The subcommittee members have extensive experience in arabidopsis proteomics and collectively have produced some of the most extensive proteomics data sets for this model plant (Table S1 in the Supporting Information has a list of resources). The largest collection of proteomics data from a single study in A. thaliana was assembled into an accessible database (AtProteome; http://fgcz-atproteome.unizh.ch/index.php) and was recently published by the Baginsky lab.1 The database provides links to major Arabidopsis online resources, and raw data have been deposited in PRIDE and PRIDE BioMart. Included in this database is an Arabidopsis proteome map that provides evidence for the expression of {approx}50% of all predicted gene models, including several alternative gene models that are not represented in The Arabidopsis Information Resource (TAIR) protein database. A set of organ-specific biomarkers is provided, as well as organ-specific proteotypic peptides for 4105 proteins that can be used to facilitate targeted quantitative proteomic surveys. In the future, the AtProteome database will be linked to additional existing resources developed by MASCP members, such as PPDB, ProMEX, and SUBA. The most comprehensive study on the Arabidopsis chloroplast proteome, which includes information on chloroplast sorting signals, posttranslational modifications (PTMs), and protein abundances (analyzed by high-accuracy MS [Orbitrap]), was recently published by the van Wijk lab.2 These and previous data are available via the plant proteome database (PPDB; http://ppdb.tc.cornell.edu) for A. thaliana and maize. PPDB provides genome-wide experimental and functional characterization of the A. thaliana and maize proteomes, including PTMs and subcellular localization information, with an emphasis on leaf and plastid proteins. Maize and Arabidopsis proteome entries are directly linked via internal BLAST alignments within PPDB. Direct links for each protein to TAIR, SUBA, ProMEX, and other resources are also provided.« less

  20. Proteomics Analysis of Bladder Cancer Exosomes*

    PubMed Central

    Welton, Joanne L.; Khanna, Sanjay; Giles, Peter J.; Brennan, Paul; Brewis, Ian A.; Staffurth, John; Mason, Malcolm D.; Clayton, Aled

    2010-01-01

    Exosomes are nanometer-sized vesicles, secreted by various cell types, present in biological fluids that are particularly rich in membrane proteins. Ex vivo analysis of exosomes may provide biomarker discovery platforms and form non-invasive tools for disease diagnosis and monitoring. These vesicles have never before been studied in the context of bladder cancer, a major malignancy of the urological tract. We present the first proteomics analysis of bladder cancer cell exosomes. Using ultracentrifugation on a sucrose cushion, exosomes were highly purified from cultured HT1376 bladder cancer cells and verified as low in contaminants by Western blotting and flow cytometry of exosome-coated beads. Solubilization in a buffer containing SDS and DTT was essential for achieving proteomics analysis using an LC-MALDI-TOF/TOF MS approach. We report 353 high quality identifications with 72 proteins not previously identified by other human exosome proteomics studies. Overrepresentation analysis to compare this data set with previous exosome proteomics studies (using the ExoCarta database) revealed that the proteome was consistent with that of various exosomes with particular overlap with exosomes of carcinoma origin. Interrogating the Gene Ontology database highlighted a strong association of this proteome with carcinoma of bladder and other sites. The data also highlighted how homology among human leukocyte antigen haplotypes may confound MASCOT designation of major histocompatability complex Class I nomenclature, requiring data from PCR-based human leukocyte antigen haplotyping to clarify anomalous identifications. Validation of 18 MS protein identifications (including basigin, galectin-3, trophoblast glycoprotein (5T4), and others) was performed by a combination of Western blotting, flotation on linear sucrose gradients, and flow cytometry, confirming their exosomal expression. Some were confirmed positive on urinary exosomes from a bladder cancer patient. In summary, the exosome proteomics data set presented is of unrivaled quality. The data will aid in the development of urine exosome-based clinical tools for monitoring disease and will inform follow-up studies into varied aspects of exosome manufacture and function. PMID:20224111

  1. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    DOE PAGES

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; ...

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less

  2. ms_lims, a simple yet powerful open source laboratory information management system for MS-driven proteomics.

    PubMed

    Helsens, Kenny; Colaert, Niklaas; Barsnes, Harald; Muth, Thilo; Flikka, Kristian; Staes, An; Timmerman, Evy; Wortelkamp, Steffi; Sickmann, Albert; Vandekerckhove, Joël; Gevaert, Kris; Martens, Lennart

    2010-03-01

    MS-based proteomics produces large amounts of mass spectra that require processing, identification and possibly quantification before interpretation can be undertaken. High-throughput studies require automation of these various steps, and management of the data in association with the results obtained. We here present ms_lims (http://genesis.UGent.be/ms_lims), a freely available, open-source system based on a central database to automate data management and processing in MS-driven proteomics analyses.

  3. Evaluation of "shotgun" proteomics for identification of biological threat agents in complex environmental matrixes: experimental simulations.

    PubMed

    Verberkmoes, Nathan C; Hervey, W Judson; Shah, Manesh; Land, Miriam; Hauser, Loren; Larimer, Frank W; Van Berkel, Gary J; Goeringer, Douglas E

    2005-02-01

    There is currently a great need for rapid detection and positive identification of biological threat agents, as well as microbial species in general, directly from complex environmental samples. This need is most urgent in the area of homeland security, but also extends into medical, environmental, and agricultural sciences. Mass-spectrometry-based analysis is one of the leading technologies in the field with a diversity of different methodologies for biothreat detection. Over the past few years, "shotgun"proteomics has become one method of choice for the rapid analysis of complex protein mixtures by mass spectrometry. Recently, it was demonstrated that this methodology is capable of distinguishing a target species against a large database of background species from a single-component sample or dual-component mixtures with relatively the same concentration. Here, we examine the potential of shotgun proteomics to analyze a target species in a background of four contaminant species. We tested the capability of a common commercial mass-spectrometry-based shotgun proteomics platform for the detection of the target species (Escherichia coli) at four different concentrations and four different time points of analysis. We also tested the effect of database size on positive identification of the four microbes used in this study by testing a small (13-species) database and a large (261-species) database. The results clearly indicated that this technology could easily identify the target species at 20% in the background mixture at a 60, 120, 180, or 240 min analysis time with the small database. The results also indicated that the target species could easily be identified at 20% or 6% but could not be identified at 0.6% or 0.06% in either a 240 min analysis or a 30 h analysis with the small database. The effects of the large database were severe on the target species where detection above the background at any concentration used in this study was impossible, though the three other microbes used in this study were clearly identified above the background when analyzed with the large database. This study points to the potential application of this technology for biological threat agent detection but highlights many areas of needed research before the technology will be useful in real world samples.

  4. A glimpse into the proteome of phototrophic bacterium Rhodobacter capsulatus.

    PubMed

    Onder, Ozlem; Aygun-Sunar, Semra; Selamoglu, Nur; Daldal, Fevzi

    2010-01-01

    A first glimpse into the proteome of Rhodobacter capsulatus revealed more than 450 (with over 210 cytoplasmic and 185 extracytoplasmic known as well as 55 unknown) proteins that are identified with high degree of confidence using nLC-MS/MS analyses. The accumulated data provide a solid platform for ongoing efforts to establish the proteome of this species and the cellular locations of its constituents. They also indicate that at least 40 of the identified proteins, which were annotated in genome databases as unknown hypothetical proteins, correspond to predicted translation products that are indeed present in cells under the growth conditions used in this work. In addition, matching the identification labels of the proteins reported between the two available R. capsulatus genome databases (ERGO-light with RRCxxxxx and NT05 with NT05RCxxxx numbers) indicated that 11 such proteins are listed only in the latter database.

  5. Recent advances in proteomics of cereals.

    PubMed

    Bansal, Monika; Sharma, Madhu; Kanwar, Priyanka; Goyal, Aakash

    Cereals contribute a major part of human nutrition and are considered as an integral source of energy for human diets. With genomic databases already available in cereals such as rice, wheat, barley, and maize, the focus has now moved to proteome analysis. Proteomics studies involve the development of appropriate databases based on developing suitable separation and purification protocols, identification of protein functions, and can confirm their functional networks based on already available data from other sources. Tremendous progress has been made in the past decade in generating huge data-sets for covering interactions among proteins, protein composition of various organs and organelles, quantitative and qualitative analysis of proteins, and to characterize their modulation during plant development, biotic, and abiotic stresses. Proteomics platforms have been used to identify and improve our understanding of various metabolic pathways. This article gives a brief review of efforts made by different research groups on comparative descriptive and functional analysis of proteomics applications achieved in the cereal science so far.

  6. Clinical veterinary proteomics: Techniques and approaches to decipher the animal plasma proteome.

    PubMed

    Ghodasara, P; Sadowski, P; Satake, N; Kopp, S; Mills, P C

    2017-12-01

    Over the last two decades, technological advancements in the field of proteomics have advanced our understanding of the complex biological systems of living organisms. Techniques based on mass spectrometry (MS) have emerged as powerful tools to contextualise existing genomic information and to create quantitative protein profiles from plasma, tissues or cell lines of various species. Proteomic approaches have been used increasingly in veterinary science to investigate biological processes responsible for growth, reproduction and pathological events. However, the adoption of proteomic approaches by veterinary investigators lags behind that of researchers in the human medical field. Furthermore, in contrast to human proteomics studies, interpretation of veterinary proteomic data is difficult due to the limited protein databases available for many animal species. This review article examines the current use of advanced proteomics techniques for evaluation of animal health and welfare and covers the current status of clinical veterinary proteomics research, including successful protein identification and data interpretation studies. It includes a description of an emerging tool, sequential window acquisition of all theoretical fragment ion mass spectra (SWATH-MS), available on selected mass spectrometry instruments. This newly developed data acquisition technique combines advantages of discovery and targeted proteomics approaches, and thus has the potential to advance the veterinary proteomics field by enhancing identification and reproducibility of proteomics data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Pre-fractionation strategies to resolve pea (Pisum sativum) sub-proteomes

    PubMed Central

    Meisrimler, Claudia-Nicole; Menckhoff, Ljiljana; Kukavica, Biljana M.; Lüthje, Sabine

    2015-01-01

    Legumes are important crop plants and pea (Pisum sativum L.) has been investigated as a model with respect to several physiological aspects. The sequencing of the pea genome has not been completed. Therefore, proteomic approaches are currently limited. Nevertheless, the increasing numbers of available EST-databases as well as the high homology of the pea and medicago genome (Medicago truncatula Gaertner) allow the successful identification of proteins. Due to the un-sequenced pea genome, pre-fractionation approaches have been used in pea proteomic surveys in the past. Aside from a number of selective proteome studies on crude extracts and the chloroplast, few studies have targeted other components such as the pea secretome, an important sub-proteome of interest due to its role in abiotic and biotic stress processes. The secretome itself can be further divided into different sub-proteomes (plasma membrane, apoplast, cell wall proteins). Cell fractionation in combination with different gel-electrophoresis, chromatography methods and protein identification by mass spectrometry are important partners to gain insight into pea sub-proteomes, post-translational modifications and protein functions. Overall, pea proteomics needs to link numerous existing physiological and biochemical data to gain further insight into adaptation processes, which play important roles in field applications. Future developments and directions in pea proteomics are discussed. PMID:26539198

  8. A standardized framing for reporting protein identifications in mzIdentML 1.2

    PubMed Central

    Seymour, Sean L.; Farrah, Terry; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John S.; Searle, Brian C.; Tabb, David L.; Vizcaíno, Juan Antonio; Prieto, Gorka; Uszkoreit, Julian; Eisenacher, Martin; Martínez-Bartolomé, Salvador; Ghali, Fawaz; Jones, Andrew R.

    2015-01-01

    Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories like the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software. PMID:25092112

  9. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

    PubMed

    Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

    2017-09-13

    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.

  10. Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

    PubMed Central

    Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.; Carpenter, Kristin L.; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo J.; Tabb, David L.

    2012-01-01

    Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines. PMID:22217208

  11. Computational approaches to protein inference in shotgun proteomics

    PubMed Central

    2012-01-01

    Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300

  12. Coupling Capillary Zone Electrophoresis to a Q Exactive HF Mass Spectrometer for Top-down Proteomics: 580 Proteoform Identifications from Yeast.

    PubMed

    Zhao, Yimeng; Sun, Liangliang; Zhu, Guijie; Dovichi, Norman J

    2016-10-07

    We used reversed-phase liquid chromatography to separate the yeast proteome into 23 fractions. These fractions were then analyzed using capillary zone electrophoresis (CZE) coupled to a Q-Exactive HF mass spectrometer using an electrokinetically pumped sheath flow interface. The parameters of the mass spectrometer were first optimized for top-down proteomics using a mixture of seven model proteins; we observed that intact protein mode with a trapping pressure of 0.2 and normalized collision energy of 20% produced the highest intact protein signals and most protein identifications. Then, we applied the optimized parameters for analysis of the fractionated yeast proteome. From this, 580 proteoforms and 180 protein groups were identified via database searching of the MS/MS spectra. This number of proteoform identifications is two times larger than that of previous CZE-MS/MS studies. An additional 3,243 protein species were detected based on the parent ion spectra. Post-translational modifications including N-terminal acetylation, signal peptide removal, and oxidation were identified.

  13. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

    PubMed

    Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

    2017-01-06

    In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.

  14. Transcriptome and proteomic analysis of mango (Mangifera indica Linn) fruits.

    PubMed

    Wu, Hong-xia; Jia, Hui-min; Ma, Xiao-wei; Wang, Song-biao; Yao, Quan-sheng; Xu, Wen-tian; Zhou, Yi-gang; Gao, Zhong-shan; Zhan, Ru-lin

    2014-06-13

    Here we used Illumina RNA-seq technology for transcriptome sequencing of a mixed fruit sample from 'Zill' mango (Mangifera indica Linn) fruit pericarp and pulp during the development and ripening stages. RNA-seq generated 68,419,722 sequence reads that were assembled into 54,207 transcripts with a mean length of 858bp, including 26,413 clusters and 27,794 singletons. A total of 42,515(78.43%) transcripts were annotated using public protein databases, with a cut-off E-value above 10(-5), of which 35,198 and 14,619 transcripts were assigned to gene ontology terms and clusters of orthologous groups respectively. Functional annotation against the Kyoto Encyclopedia of Genes and Genomes database identified 23,741(43.79%) transcripts which were mapped to 128 pathways. These pathways revealed many previously unknown transcripts. We also applied mass spectrometry-based transcriptome data to characterize the proteome of ripe fruit. LC-MS/MS analysis of the mango fruit proteome was using tandem mass spectrometry (MS/MS) in an LTQ Orbitrap Velos (Thermo) coupled online to the HPLC. This approach enabled the identification of 7536 peptides that matched 2754 proteins. Our study provides a comprehensive sequence for a systemic view of transcriptome during mango fruit development and the most comprehensive fruit proteome to date, which are useful for further genomics research and proteomic studies. Our study provides a comprehensive sequence for a systemic view of both the transcriptome and proteome of mango fruit, and a valuable reference for further research on gene expression and protein identification. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

    PubMed Central

    Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

    2012-01-01

    Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179

  16. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  17. Proteogenomics approaches for studying cancer biology and their potential in the identification of acute myeloid leukemia biomarkers.

    PubMed

    Hernandez-Valladares, Maria; Vaudel, Marc; Selheim, Frode; Berven, Frode; Bruserud, Øystein

    2017-08-01

    Mass spectrometry (MS)-based proteomics has become an indispensable tool for the characterization of the proteome and its post-translational modifications (PTM). In addition to standard protein sequence databases, proteogenomics strategies search the spectral data against the theoretical spectra obtained from customized protein sequence databases. Up to date, there are no published proteogenomics studies on acute myeloid leukemia (AML) samples. Areas covered: Proteogenomics involves the understanding of genomic and proteomic data. The intersection of both datatypes requires advanced bioinformatics skills. A standard proteogenomics workflow that could be used for the study of AML samples is described. The generation of customized protein sequence databases as well as bioinformatics tools and pipelines commonly used in proteogenomics are discussed in detail. Expert commentary: Drawing on evidence from recent cancer proteogenomics studies and taking into account the public availability of AML genomic data, the interpretation of present and future MS-based AML proteomic data using AML-specific protein sequence databases could discover new biological mechanisms and targets in AML. However, proteogenomics workflows including bioinformatics guidelines can be challenging for the wide AML research community. It is expected that further automation and simplification of the bioinformatics procedures might attract AML investigators to adopt the proteogenomics strategy.

  18. Novel utilization of the outer membrane proteins for the identification and differentiation of pathogenic versus nonpathogenic microbial strains using mass spectrometry-based proteomics approach

    NASA Astrophysics Data System (ADS)

    Jabbour, Rabih E.; Wade, Mary; Deshpande, Samir V.; McCubbin, Patrick; Snyder, A. Peter; Bevilacqua, Vicky

    2012-06-01

    Mass spectrometry based proteomic approaches are showing promising capabilities in addressing various biological and biochemical issues. Outer membrane proteins (OMPs) are often associated with virulence in gram-negative pathogens and could prove to be excellent model biomarkers for strain level differentiation among bacteria. Whole cells and OMP extracts were isolated from pathogenic and non-pathogenic strains of Francisella tularensis, Burkholderia thailandensis, and Burkholderia mallei. OMP extracts were compared for their ability to differentiate and delineate the correct database organism to an experimental sample and for the degree of dissimilarity to the nearest-neighbor database strains. This study addresses the comparative experimental proteome analyses of OMPs vs. whole cell lysates on the strain-level discrimination among gram negative pathogenic and non-pathogenic strains.

  19. FunRich proteomics software analysis, let the fun begin!

    PubMed

    Benito-Martin, Alberto; Peinado, Héctor

    2015-08-01

    Protein MS analysis is the preferred method for unbiased protein identification. It is normally applied to a large number of both small-scale and high-throughput studies. However, user-friendly computational tools for protein analysis are still needed. In this issue, Mathivanan and colleagues (Proteomics 2015, 15, 2597-2601) report the development of FunRich software, an open-access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Integration of deep transcriptome and proteome analyses reveals the components of alkaloid metabolism in opium poppy cell cultures

    PubMed Central

    2010-01-01

    Background Papaver somniferum (opium poppy) is the source for several pharmaceutical benzylisoquinoline alkaloids including morphine, the codeine and sanguinarine. In response to treatment with a fungal elicitor, the biosynthesis and accumulation of sanguinarine is induced along with other plant defense responses in opium poppy cell cultures. The transcriptional induction of alkaloid metabolism in cultured cells provides an opportunity to identify components of this process via the integration of deep transcriptome and proteome databases generated using next-generation technologies. Results A cDNA library was prepared for opium poppy cell cultures treated with a fungal elicitor for 10 h. Using 454 GS-FLX Titanium pyrosequencing, 427,369 expressed sequence tags (ESTs) with an average length of 462 bp were generated. Assembly of these sequences yielded 93,723 unigenes, of which 23,753 were assigned Gene Ontology annotations. Transcripts encoding all known sanguinarine biosynthetic enzymes were identified in the EST database, 5 of which were represented among the 50 most abundant transcripts. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) of total protein extracts from cell cultures treated with a fungal elicitor for 50 h facilitated the identification of 1,004 proteins. Proteins were fractionated by one-dimensional SDS-PAGE and digested with trypsin prior to LC-MS/MS analysis. Query of an opium poppy-specific EST database substantially enhanced peptide identification. Eight out of 10 known sanguinarine biosynthetic enzymes and many relevant primary metabolic enzymes were represented in the peptide database. Conclusions The integration of deep transcriptome and proteome analyses provides an effective platform to catalogue the components of secondary metabolism, and to identify genes encoding uncharacterized enzymes. The establishment of corresponding transcript and protein databases generated by next-generation technologies in a system with a well-defined metabolite profile facilitates an improved linkage between genes, enzymes, and pathway components. The proteome database represents the most relevant alkaloid-producing enzymes, compared with the much deeper and more complete transcriptome library. The transcript database contained full-length mRNAs encoding most alkaloid biosynthetic enzymes, which is a key requirement for the functional characterization of novel gene candidates. PMID:21083930

  1. Proteogenomics Dashboard for the Human Proteome Project.

    PubMed

    Tabas-Madrid, Daniel; Alves-Cruzeiro, Joao; Segura, Victor; Guruceaga, Elizabeth; Vialas, Vital; Prieto, Gorka; García, Carlos; Corrales, Fernando J; Albar, Juan Pablo; Pascual-Montano, Alberto

    2015-09-04

    dasHPPboard is a novel proteomics-based dashboard that collects and reports the experiments produced by the Spanish Human Proteome Project consortium (SpHPP) and aims to help HPP to map the entire human proteome. We have followed the strategy of analog genomics projects like the Encyclopedia of DNA Elements (ENCODE), which provides a vast amount of data on human cell lines experiments. The dashboard includes results of shotgun and selected reaction monitoring proteomics experiments, post-translational modifications information, as well as proteogenomics studies. We have also processed the transcriptomics data from the ENCODE and Human Body Map (HBM) projects for the identification of specific gene expression patterns in different cell lines and tissues, taking special interest in those genes having little proteomic evidence available (missing proteins). Peptide databases have been built using single nucleotide variants and novel junctions derived from RNA-Seq data that can be used in search engines for sample-specific protein identifications on the same cell lines or tissues. The dasHPPboard has been designed as a tool that can be used to share and visualize a combination of proteomic and transcriptomic data, providing at the same time easy access to resources for proteogenomics analyses. The dasHPPboard can be freely accessed at: http://sphppdashboard.cnb.csic.es.

  2. PIQMIe: a web server for semi-quantitative proteomics data management and analysis

    PubMed Central

    Kuzniar, Arnold; Kanaar, Roland

    2014-01-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. PMID:24861615

  3. PIQMIe: a web server for semi-quantitative proteomics data management and analysis.

    PubMed

    Kuzniar, Arnold; Kanaar, Roland

    2014-07-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Proteomic characterization of hempseed (Cannabis sativa L.).

    PubMed

    Aiello, Gilda; Fasoli, Elisa; Boschin, Giovanna; Lammi, Carmen; Zanoni, Chiara; Citterio, Attilio; Arnoldi, Anna

    2016-09-16

    This paper presents an investigation on hempseed proteome. The experimental approach, based on combinatorial peptide ligand libraries (CPLLs), SDS-PAGE separation, nLC-ESI-MS/MS identification, and database search, permitted identifying in total 181 expressed proteins. This very large number of identifications was achieved by searching in two databases: Cannabis sativa L. (56 gene products identified) and Arabidopsis thaliana (125 gene products identified). By performing a protein-protein association network analysis using the STRING software, it was possible to build the first interactomic map of all detected proteins, characterized by 137 nodes and 410 interactions. Finally, a Gene Ontology analysis of the identified species permitted to classify their molecular functions: the great majority is involved in the seed metabolic processes (41%), responses to stimulus (8%), and biological process (7%). Hempseed is an underexploited non-legume protein-rich seed. Although its protein is well known for its digestibility, essential amino acid composition, and useful techno-functional properties, a comprehensive proteome characterization is still lacking. The objective of this work was to fill this knowledge gap and provide information useful for a better exploitation of this seed in different food products. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Introducing the PRIDE Archive RESTful web services.

    PubMed

    Reisinger, Florian; del-Toro, Noemi; Ternent, Tobias; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-07-01

    The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. GenomewidePDB 2.0: A Newly Upgraded Versatile Proteogenomic Database for the Chromosome-Centric Human Proteome Project.

    PubMed

    Jeong, Seul-Ki; Hancock, William S; Paik, Young-Ki

    2015-09-04

    Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.

  7. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship.

    PubMed

    Brunet, Marie A; Levesque, Sébastien A; Hunting, Darel J; Cohen, Alan A; Roucou, Xavier

    2018-05-01

    Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes. © 2018 Brunet et al.; Published by Cold Spring Harbor Laboratory Press.

  8. Proteomic analysis of the Theileria annulata schizont

    PubMed Central

    Witschi, M.; Xia, D.; Sanderson, S.; Baumgartner, M.; Wastling, J.M.; Dobbelaere, D.A.E.

    2013-01-01

    The apicomplexan parasite, Theileria annulata, is the causative agent of tropical theileriosis, a devastating lymphoproliferative disease of cattle. The schizont stage transforms bovine leukocytes and provides an intriguing model to study host/pathogen interactions. The genome of T. annulata has been sequenced and transcriptomic data are rapidly accumulating. In contrast, little is known about the proteome of the schizont, the pathogenic, transforming life cycle stage of the parasite. Using one-dimensional (1-D) gel LC-MS/MS, a proteomic analysis of purified T. annulata schizonts was carried out. In whole parasite lysates, 645 proteins were identified. Proteins with transmembrane domains (TMDs) were under-represented and no proteins with more than four TMDs could be detected. To tackle this problem, Triton X-114 treatment was applied, which facilitates the extraction of membrane proteins, followed by 1-D gel LC-MS/MS. This resulted in the identification of an additional 153 proteins. Half of those had one or more TMD and 30 proteins with more than four TMDs were identified. This demonstrates that Triton X-114 treatment can provide a valuable additional tool for the identification of new membrane proteins in proteomic studies. With two exceptions, all proteins involved in glycolysis and the citric acid cycle were identified. For at least 29% of identified proteins, the corresponding transcripts were not present in the existing expressed sequence tag databases. The proteomics data were integrated into the publicly accessible database resource at EuPathDB (www.eupathdb.org) so that mass spectrometry-based protein expression evidence for T. annulata can be queried alongside transcriptional and other genomics data available for these parasites. PMID:23178997

  9. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

    PubMed Central

    2013-01-01

    Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658

  10. Unraveling snake venom complexity with 'omics' approaches: challenges and perspectives.

    PubMed

    Zelanis, André; Tashima, Alexandre Keiji

    2014-09-01

    The study of snake venom proteomes (venomics) has been experiencing a burst of reports, however the comprehensive knowledge of the dynamic range of proteins present within a single venom, the set of post-translational modifications (PTMs) as well as the lack of a comprehensive database related to venom proteins are among the main challenges in venomics research. The phenotypic plasticity in snake venom proteomes together with their inherent toxin proteoform diversity, points out to the use of integrative analysis in order to better understand their actual complexity. In this regard, such a systems venomics task should encompass the integration of data from transcriptomic and proteomic studies (specially the venom gland proteome), the identification of biological PTMs, and the estimation of artifactual proteomes and peptidomes generated by sample handling procedures. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Analysis of human serum phosphopeptidome by a focused database searching strategy.

    PubMed

    Zhu, Jun; Wang, Fangjun; Cheng, Kai; Song, Chunxia; Qin, Hongqiang; Hu, Lianghai; Figeys, Daniel; Ye, Mingliang; Zou, Hanfa

    2013-01-14

    As human serum is an important source for early diagnosis of many serious diseases, analysis of serum proteome and peptidome has been extensively performed. However, the serum phosphopeptidome was less explored probably because the effective method for database searching is lacking. Conventional database searching strategy always uses the whole proteome database, which is very time-consuming for phosphopeptidome search due to the huge searching space resulted from the high redundancy of the database and the setting of dynamic modifications during searching. In this work, a focused database searching strategy using an in-house collected human serum pro-peptidome target/decoy database (HuSPep) was established. It was found that the searching time was significantly decreased without compromising the identification sensitivity. By combining size-selective Ti (IV)-MCM-41 enrichment, RP-RP off-line separation, and complementary CID and ETD fragmentation with the new searching strategy, 143 unique endogenous phosphopeptides and 133 phosphorylation sites (109 novel sites) were identified from human serum with high reliability. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Evaluation of Proteomic Search Engines for the Analysis of Histone Modifications

    PubMed Central

    2015-01-01

    Identification of histone post-translational modifications (PTMs) is challenging for proteomics search engines. Including many histone PTMs in one search increases the number of candidate peptides dramatically, leading to low search speed and fewer identified spectra. To evaluate database search engines on identifying histone PTMs, we present a method in which one kind of modification is searched each time, for example, unmodified, individually modified, and multimodified, each search result is filtered with false discovery rate less than 1%, and the identifications of multiple search engines are combined to obtain confident results. We apply this method for eight search engines on histone data sets. We find that two search engines, pFind and Mascot, identify most of the confident results at a reasonable speed, so we recommend using them to identify histone modifications. During the evaluation, we also find some important aspects for the analysis of histone modifications. Our evaluation of different search engines on identifying histone modifications will hopefully help those who are hoping to enter the histone proteomics field. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001118. PMID:25167464

  13. Evaluation of proteomic search engines for the analysis of histone modifications.

    PubMed

    Yuan, Zuo-Fei; Lin, Shu; Molden, Rosalynn C; Garcia, Benjamin A

    2014-10-03

    Identification of histone post-translational modifications (PTMs) is challenging for proteomics search engines. Including many histone PTMs in one search increases the number of candidate peptides dramatically, leading to low search speed and fewer identified spectra. To evaluate database search engines on identifying histone PTMs, we present a method in which one kind of modification is searched each time, for example, unmodified, individually modified, and multimodified, each search result is filtered with false discovery rate less than 1%, and the identifications of multiple search engines are combined to obtain confident results. We apply this method for eight search engines on histone data sets. We find that two search engines, pFind and Mascot, identify most of the confident results at a reasonable speed, so we recommend using them to identify histone modifications. During the evaluation, we also find some important aspects for the analysis of histone modifications. Our evaluation of different search engines on identifying histone modifications will hopefully help those who are hoping to enter the histone proteomics field. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001118.

  14. Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective☆

    PubMed Central

    Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

    2014-01-01

    Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006

  15. Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.

    PubMed

    Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

    2014-01-01

    Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.

    PubMed

    Sadygov, Rovshan G; Cociorva, Daniel; Yates, John R

    2004-12-01

    Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  18. Bioinformatics strategies in life sciences: from data processing and data warehousing to biological knowledge extraction.

    PubMed

    Thiele, Herbert; Glandorf, Jörg; Hufnagel, Peter

    2010-05-27

    With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis software available, researchers today face major challenges validating and comparing their Proteomics data. Here we present a new generation of the ProteinScape bioinformatics platform, now enabling researchers to manage Proteomics data from the generation and data warehousing to a central data repository with a strong focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It addresses scientists; current needs in proteomics identification, quantification and validation. But producing large protein lists is not the end point in Proteomics, where one ultimately aims to answer specific questions about the biological condition or disease model of the analyzed sample. In this context, a new tool has been developed at the Spanish Centro Nacional de Biotecnologia Proteomics Facility termed PIKE (Protein information and Knowledge Extractor) that allows researchers to control, filter and access specific information from genomics and proteomic databases, to understand the role and relationships of the proteins identified in the experiments. Additionally, an EU funded project, ProDac, has coordinated systematic data collection in public standards-compliant repositories like PRIDE. This will cover all aspects from generating MS data in the laboratory, assembling the whole annotation information and storing it together with identifications in a standardised format.

  19. A novel quantification-driven proteomic strategy identifies an endogenous peptide of pleiotrophin as a new biomarker of Alzheimer's disease.

    PubMed

    Skillbäck, Tobias; Mattsson, Niklas; Hansson, Karl; Mirgorodskaya, Ekaterina; Dahlén, Rahil; van der Flier, Wiesje; Scheltens, Philip; Duits, Floor; Hansson, Oskar; Teunissen, Charlotte; Blennow, Kaj; Zetterberg, Henrik; Gobom, Johan

    2017-10-17

    We present a new, quantification-driven proteomic approach to identifying biomarkers. In contrast to the identification-driven approach, limited in scope to peptides that are identified by database searching in the first step, all MS data are considered to select biomarker candidates. The endopeptidome of cerebrospinal fluid from 40 Alzheimer's disease (AD) patients, 40 subjects with mild cognitive impairment, and 40 controls with subjective cognitive decline was analyzed using multiplex isobaric labeling. Spectral clustering was used to match MS/MS spectra. The top biomarker candidate cluster (215% higher in AD compared to controls, area under ROC curve = 0.96) was identified as a fragment of pleiotrophin located near the protein's C-terminus. Analysis of another cohort (n = 60 over four clinical groups) verified that the biomarker was increased in AD patients while no change in controls, Parkinson's disease or progressive supranuclear palsy was observed. The identification of the novel biomarker pleiotrophin 151-166 demonstrates that our quantification-driven proteomic approach is a promising method for biomarker discovery, which may be universally applicable in clinical proteomics.

  20. MASCOT HTML and XML parser: an implementation of a novel object model for protein identification data.

    PubMed

    Yang, Chunguang G; Granite, Stephen J; Van Eyk, Jennifer E; Winslow, Raimond L

    2006-11-01

    Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

  1. SpecOMS: A Full Open Modification Search Method Performing All-to-All Spectra Comparisons within Minutes.

    PubMed

    David, Matthieu; Fertin, Guillaume; Rogniaux, Hélène; Tessier, Dominique

    2017-08-04

    The analysis of discovery proteomics experiments relies on algorithms that identify peptides from their tandem mass spectra. The almost exhaustive interpretation of these spectra remains an unresolved issue. At present, an important number of missing interpretations is probably due to peptides displaying post-translational modifications and variants that yield spectra that are particularly difficult to interpret. However, the emergence of a new generation of mass spectrometers that provide high fragment ion accuracy has paved the way for more efficient algorithms. We present a new software, SpecOMS, that can handle the computational complexity of pairwise comparisons of spectra in the context of large volumes. SpecOMS can compare a whole set of experimental spectra generated by a discovery proteomics experiment to a whole set of theoretical spectra deduced from a protein database in a few minutes on a standard workstation. SpecOMS can ingeniously exploit those capabilities to improve the peptide identification process, allowing strong competition between all possible peptides for spectrum interpretation. Remarkably, this software resolves the drawbacks (i.e., efficiency problems and decreased sensitivity) that usually accompany open modification searches. We highlight this promising approach using results obtained from the analysis of a public human data set downloaded from the PRIDE (PRoteomics IDEntification) database.

  2. PRIDE: new developments and new datasets.

    PubMed

    Jones, Philip; Côté, Richard G; Cho, Sang Yun; Klie, Sebastian; Martens, Lennart; Quinn, Antony F; Thorneycroft, David; Hermjakob, Henning

    2008-01-01

    The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.

  3. Gaining knowledge from previously unexplained spectra-application of the PTM-Explorer software to detect PTM in HUPO BPP MS/MS data.

    PubMed

    Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin

    2006-09-01

    A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.

  4. 2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments.

    PubMed

    Allmer, Jens; Kuhlgert, Sebastian; Hippler, Michael

    2008-07-07

    The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed. In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application. We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.

  5. Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kou, Qiang; Zhu, Binhai; Wu, Si

    Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs comparedmore » with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.« less

  6. Rapid Detection & Identification of Bacillus Species using MALDI-TOF/TOF and Biomarker Database

    DTIC Science & Technology

    2006-06-01

    rRNA sequence analysis. Multilocus enzyme electrophoresis ( MEE ) and comparative DNA sequence analysis suggest that they may represent a single species...adaptation of the MEE method [63] but with greater discrimination [64]. All of these new PCR-based subtyping methods are certainly superior and more...Demirev, P.A., Lin, J.S., Pineda , F.J., and Fenselau, C. (2001). Bioinformatics and mass spectrometry for microorganism identification: proteome-wide

  7. Purification and proteomic analysis of plant plasma membranes.

    PubMed

    Alexandersson, Erik; Gustavsson, Niklas; Bernfur, Katja; Karlsson, Adine; Kjellbom, Per; Larsson, Christer

    2008-01-01

    All techniques needed for proteomic analyses of plant plasma membranes are described in detail, from isolation of plasma membranes to protein identification by mass spectrometry (MS). Plasma membranes are isolated by aqueous two-phase partitioning yielding vesicles with a cytoplasmic side-in orientation and a purity of about 95%. These vesicles are turned inside-out by treatment with Brij 58, which removes soluble contaminating proteins enclosed in the vesicles as well as loosely attached proteins. The final plasma membrane preparation thus retains all integral proteins and many peripheral proteins. Proteins are separated by one-dimensional sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE), and protein bands are excised and digested with trypsin. Peptides in tryptic digests are separated by nanoflow liquid chromatography and either fed directly into an ESI-MS or spotted onto matrix-assisted laser desorption ionization (MALDI) plates for analysis with MALDI-MS. Finally, data processing and database searching are used for protein identification to define a plasma membrane proteome.

  8. Proteomic profiling of human pleural effusion using two-dimensional nano liquid chromatography tandem mass spectrometry.

    PubMed

    Tyan, Yu-Chang; Wu, Hsin-Yi; Lai, Wu-Wei; Su, Wu-Chou; Liao, Pao-Chi

    2005-01-01

    Pleural effusion, an accumulation of pleural fluid, contains proteins originated from plasma filtrate and, especially when tissues are damaged, parenchyma interstitial spaces of lungs and/or other organs. This study details protein profiles in human pleural effusion from 43 lung adenocarcinoma patients by a two-dimensional nano-high performance liquid chromatography electrospray ionization tandem mass spectrometry (2D nano-HPLC-ESI-MS/MS) system. The experimental results revealed the identification of 1415 unique proteins from human pleural effusion. Among these 124 proteins identified with higher confidence levels, some proteins have not been reported in plasma and may represent proteins specifically present in pleural effusion. These proteins are valuable for mass identification of differentially expressed proteins involved in proteomics database and screening biomarker to further study in human lung adenocarcinoma. The significance of the use of proteomics analysis of human pleural fluid for the search of new lung cancer marker proteins, and for their simultaneous display and analysis in patients suffering from lung disorders has been examined.

  9. Lipid Identification by Untargeted Tandem Mass Spectrometry Coupled with Ultra-High-Pressure Liquid Chromatography.

    PubMed

    Gugiu, Gabriel B

    2017-01-01

    Lipidomics refers to the large-scale study of lipids in biological systems (Wenk, Nat Rev Drug Discov 4(7):594-610, 2005; Rolim et al., Gene 554(2):131-139, 2015). From a mass spectrometric point of view, by lipidomics we understand targeted or untargeted mass spectrometric analysis of lipids using either liquid chromatography (LC) (Castro-Perez et al., J Proteome Res 9(5):2377-2389, 2010) or shotgun (Han and Gross, Mass Spectrom Rev 24(3):367-412, 2005) approaches coupled with tandem mass spectrometry. This chapter describes the former methodology, which is becoming rapidly the preferred method for lipid identification owing to similarities with established omics workflows, such as proteomics (Washburn et al., Nat Biotechnol 19(3):242-247, 2001) or genomics (Yadav, J Biomol Tech: JBT 18(5):277, 2007). The workflow described consists in lipid extraction using a modified Bligh and Dyer method (Bligh and Dyer, Can J Biochem Physiol 37(8):911-917, 1959), ultra high pressure liquid chromatography fractionation of lipid samples on a reverse phase C18 column, followed by tandem mass spectrometric analysis and in silico database search for lipid identification based on MSMS spectrum matching (Kind et al., Nat Methods 10(8):755-758, 2013; Yamada et al., J Chromatogr A 1292:211-218, 2013; Taguchi and Ishikawa, J Chromatogr A 1217(25):4229-4239, 2010; Peake et al., Thermoscientifices 1-3, 2015) and accurate mass of parent ion (Sud et al., Nucleic Acids Res 35(database issue):D527-D532, 2007; Wishart et al., Nucleic Acids Res 35(database):D521-D526, 2007).

  10. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

    PubMed

    Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

    2017-06-23

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from genome sequences, though there are over lapped proteins. Based on the demonstrated application of data stored in the database for functional analyses, it is suggested that these data will be useful for analyses of biological mechanisms in soybean. Furthermore, coupled with recent advances in information and communication technology, the usefulness of this database would increase in the analyses of biological mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. The PROTICdb database for 2-DE proteomics.

    PubMed

    Langella, Olivier; Zivy, Michel; Joets, Johann

    2007-01-01

    PROTICdb is a web-based database mainly designed to store and analyze plant proteome data obtained by 2D polyacrylamide gel electrophoresis (2D PAGE) and mass spectrometry (MS). The goals of PROTICdb are (1) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements; and (2) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of posttranslational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs from Mélanie, PDQuest, IM2d, ImageMaster(tm) 2D Platinum v5.0, Progenesis, Sequest, MS-Fit, and Mascot software, or by filling in web forms (experimental design and methods). 2D PAGE-annotated maps can be displayed, queried, and compared through the GelBrowser. Quantitative data can be easily exported in a tabulated format for statistical analyses with any third-party software. PROTICdb is based on the Oracle or the PostgreSQLDataBase Management System (DBMS) and is freely available upon request at http://cms.moulon.inra.fr/content/view/14/44/.

  12. The online Tabloid Proteome: an annotated database of protein associations

    PubMed Central

    Turan, Demet; Tavernier, Jan

    2018-01-01

    Abstract A complete knowledge of the proteome can only be attained by determining the associations between proteins, along with the nature of these associations (e.g. physical contact in protein–protein interactions, participation in complex formation or different roles in the same pathway). Despite extensive efforts in elucidating direct protein interactions, our knowledge on the complete spectrum of protein associations remains limited. We therefore developed a new approach that detects protein associations from identifications obtained after re-processing of large-scale, public mass spectrometry-based proteomics data. Our approach infers protein association based on the co-occurrence of proteins across many different proteomics experiments, and provides information that is almost completely complementary to traditional direct protein interaction studies. We here present a web interface to query and explore the associations derived from this method, called the online Tabloid Proteome. The online Tabloid Proteome also integrates biological knowledge from several existing resources to annotate our derived protein associations. The online Tabloid Proteome is freely available through a user-friendly web interface, which provides intuitive navigation and data exploration options for the user at http://iomics.ugent.be/tabloidproteome. PMID:29040688

  13. Computer aided identification of a Hevein-like antimicrobial peptide of bell pepper leaves for biotechnological use.

    PubMed

    Games, Patrícia Dias; daSilva, Elói Quintas Gonçalves; Barbosa, Meire de Oliveira; Almeida-Souza, Hebréia Oliveira; Fontes, Patrícia Pereira; deMagalhães, Marcos Jorge; Pereira, Paulo Roberto Gomes; Prates, Maura Vianna; Franco, Gloria Regina; Faria-Campos, Alessandra; Campos, Sérgio Vale Aguiar; Baracat-Pereira, Maria Cristina

    2016-12-15

    Antimicrobial peptides from plants present mechanisms of action that are different from those of conventional defense agents. They are under-explored but have a potential as commercial antimicrobials. Bell pepper leaves ('Magali R') are discarded after harvesting the fruit and are sources of bioactive peptides. This work reports the isolation by peptidomics tools, and the identification and partially characterization by computational tools of an antimicrobial peptide from bell pepper leaves, and evidences the usefulness of records and the in silico analysis for the study of plant peptides aiming biotechnological uses. Aqueous extracts from leaves were enriched in peptide by salt fractionation and ultrafiltration. An antimicrobial peptide was isolated by tandem chromatographic procedures. Mass spectrometry, automated peptide sequencing and bioinformatics tools were used alternately for identification and partial characterization of the Hevein-like peptide, named HEV-CANN. The computational tools that assisted to the identification of the peptide included BlastP, PSI-Blast, ClustalOmega, PeptideCutter, and ProtParam; conventional protein databases (DB) as Mascot, Protein-DB, GenBank-DB, RefSeq, Swiss-Prot, and UniProtKB; specific for peptides DB as Amper, APD2, CAMP, LAMPs, and PhytAMP; other tools included in ExPASy for Proteomics; The Bioactive Peptide Databases, and The Pepper Genome Database. The HEV-CANN sequence presented 40 amino acid residues, 4258.8 Da, theoretical pI-value of 8.78, and four disulfide bonds. It was stable, and it has inhibited the growth of phytopathogenic bacteria and a fungus. HEV-CANN presented a chitin-binding domain in their sequence. There was a high identity and a positive alignment of HEV-CANN sequence in various databases, but there was not a complete identity, suggesting that HEV-CANN may be produced by ribosomal synthesis, which is in accordance with its constitutive nature. Computational tools for proteomics and databases are not adjusted for short sequences, which hampered HEV-CANN identification. The adjustment of statistical tests in large databases for proteins is an alternative to promote the significant identification of peptides. The development of specific DB for plant antimicrobial peptides, with information about peptide sequences, functional genomic data, structural motifs and domains of molecules, functional domains, and peptide-biomolecule interactions are valuable and necessary.

  14. Proteomic analysis of Rhodotorula mucilaginosa: dealing with the issues of a non-conventional yeast.

    PubMed

    Addis, Maria Filippa; Tanca, Alessandro; Landolfo, Sara; Abbondio, Marcello; Cutzu, Raffaela; Biosa, Grazia; Pagnozzi, Daniela; Uzzau, Sergio; Mannazzu, Ilaria

    2016-08-01

    Red yeasts ascribed to the species Rhodotorula mucilaginosa are gaining increasing attention, due to their numerous biotechnological applications, spanning carotenoid production, liquid bioremediation, heavy metal biotransformation and antifungal and plant growth-promoting actions, but also for their role as opportunistic pathogens. Nevertheless, their characterization at the 'omic' level is still scarce. Here, we applied different proteomic workflows to R. mucilaginosa with the aim of assessing their potential in generating information on proteins and functions of biotechnological interest, with a particular focus on the carotenogenic pathway. After optimization of protein extraction, we tested several gel-based (including 2D-DIGE) and gel-free sample preparation techniques, followed by tandem mass spectrometry analysis. Contextually, we evaluated different bioinformatic strategies for protein identification and interpretation of the biological significance of the dataset. When 2D-DIGE analysis was applied, not all spots returned a unambiguous identification and no carotenogenic enzymes were identified, even upon the application of different database search strategies. Then, the application of shotgun proteomic workflows with varying levels of sensitivity provided a picture of the information depth that can be reached with different analytical resources, and resulted in a plethora of information on R. mucilaginosa metabolism. However, also in these cases no proteins related to the carotenogenic pathway were identified, thus indicating that further improvements in sequence databases and functional annotations are strictly needed for increasing the outcome of proteomic analysis of this and other non-conventional yeasts. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

  16. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934

  17. Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data*

    PubMed Central

    Sheng, Quanhu; Li, Rongxia; Dai, Jie; Li, Qingrun; Su, Zhiduan; Guo, Yan; Li, Chen; Shyr, Yu; Zeng, Rong

    2015-01-01

    Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994. PMID:25435543

  18. ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects.

    PubMed

    Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R

    2015-11-03

    Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.

  19. A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17.

    PubMed

    Liu, Suli; Im, Hogune; Bairoch, Amos; Cristofanilli, Massimo; Chen, Rui; Deutsch, Eric W; Dalton, Stephen; Fenyo, David; Fanayan, Susan; Gates, Chris; Gaudet, Pascale; Hincapie, Marina; Hanash, Samir; Kim, Hoguen; Jeong, Seul-Ki; Lundberg, Emma; Mias, George; Menon, Rajasree; Mu, Zhaomei; Nice, Edouard; Paik, Young-Ki; Uhlen, Mathias; Wells, Lance; Wu, Shiaw-Lin; Yan, Fangfei; Zhang, Fan; Zhang, Yue; Snyder, Michael; Omenn, Gilbert S; Beavis, Ronald C; Hancock, William S

    2013-01-04

    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

  20. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  1. Proteomic and Bioinformatic Profile of Primary Human Oral Epithelial Cells

    PubMed Central

    Ghosh, Santosh K.; Yohannes, Elizabeth; Bebek, Gurkan; Weinberg, Aaron; Jiang, Bin; Willard, Belinda; Chance, Mark R.; Kinter, Michael T.; McCormick, Thomas S.

    2012-01-01

    Wounding of the oral mucosa occurs frequently in a highly septic environment. Remarkably, these wounds heal quickly and the oral cavity, for the most part, remains healthy. Deciphering the normal human oral epithelial cell (NHOEC) proteome is critical for understanding the mechanism(s) of protection elicited when the mucosal barrier is intact, as well as when it is breached. Combining 2D gel electrophoresis with shotgun proteomics resulted in identification of 1662 NHOEC proteins. Proteome annotations were performed based on protein classes, molecular functions, disease association and membership in canonical and metabolic signaling pathways. Comparing the NHOEC proteome with a database of innate immunity-relevant interactions (InnateDB) identified 64 common proteins associated with innate immunity. Comparison with published salivary proteomes revealed that 738/1662 NHOEC proteins were common, suggesting that significant numbers of salivary proteins are of epithelial origin. Gene ontology analysis showed similarities in the distributions of NHOEC and saliva proteomes with regard to biological processes, and molecular functions. We also assessed the inter-individual variability of the NHOEC proteome and observed it to be comparable with other primary cells. The baseline proteome described in this study should serve as a resource for proteome studies of the oral mucosa, especially in relation to disease processes. PMID:23035736

  2. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets.

    PubMed

    Perez-Riverol, Yasset; Xu, Qing-Wei; Wang, Rui; Uszkoreit, Julian; Griss, Johannes; Sanchez, Aniel; Reisinger, Florian; Csordas, Attila; Ternent, Tobias; Del-Toro, Noemi; Dianes, Jose A; Eisenacher, Martin; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2016-01-01

    The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE.The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX "complete" submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets*

    PubMed Central

    Perez-Riverol, Yasset; Xu, Qing-Wei; Wang, Rui; Uszkoreit, Julian; Griss, Johannes; Sanchez, Aniel; Reisinger, Florian; Csordas, Attila; Ternent, Tobias; del-Toro, Noemi; Dianes, Jose A.; Eisenacher, Martin; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2016-01-01

    The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE. The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX “complete” submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/. PMID:26545397

  4. A tutorial for software development in quantitative proteomics using PSI standard formats☆

    PubMed Central

    Gonzalez-Galarza, Faviel F.; Qi, Da; Fan, Jun; Bessant, Conrad; Jones, Andrew R.

    2014-01-01

    The Human Proteome Organisation — Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards — mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23584085

  5. A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

    PubMed

    Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

    2016-01-01

    The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.

  6. Practical and Efficient Searching in Proteomics: A Cross Engine Comparison

    PubMed Central

    Paulo, Joao A.

    2014-01-01

    Background Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. Methods A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. Results The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. Conclusions The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort. PMID:25346847

  7. Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

    PubMed

    Paulo, Joao A

    2013-10-01

    Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.

  8. Combination of Bottom-up 2D-LC-MS and Semi-top-down GelFree-LC-MS Enhances Coverage of Proteome and Low Molecular Weight Short Open Reading Frame Encoded Peptides of the Archaeon Methanosarcina mazei.

    PubMed

    Cassidy, Liam; Prasse, Daniela; Linke, Dennis; Schmitz, Ruth A; Tholey, Andreas

    2016-10-07

    The recent discovery of an increasing number of small open reading frames (sORF) creates the need for suitable analytical technologies for the comprehensive identification of the corresponding gene products. For biological and functional studies the knowledge of the entire set of proteins and sORF gene products is essential. Consequently in the present study we evaluated analytical approaches that will allow for simultaneous analysis of widest parts of the proteome together with the predicted sORF. We performed a full proteome analysis of the methane producing archaeon Methanosarcina mazei strain Gö1 cytosolic proteome using a high/low pH reversed phase LC-MS bottom-up approach. The second analytical approach was based on semi-top-down strategy, encompassing a separation at intact protein level using a GelFree system, followed by digestion and LC-MS analysis. A high overlap in identified proteins was found for both approaches yielding the most comprehensive coverage of the cytosolic proteome of this organism achieved so far. The application of the second approach in combination with an adjustment of the search criteria for database searches further led to a significant increase of sORF peptide identifications, finally allowing to detect and identify 28 sORF gene products.

  9. P185-M Protein Identification and Validation of Results in Workflows that Integrate over Various Instruments, Datasets, Search Engines

    PubMed Central

    Hufnagel, P.; Glandorf, J.; Körting, G.; Jabs, W.; Schweiger-Hufnagel, U.; Hahner, S.; Lubeck, M.; Suckau, D.

    2007-01-01

    Analysis of complex proteomes often results in long protein lists, but falls short in measuring the validity of identification and quantification results on a greater number of proteins. Biological and technical replicates are mandatory, as is the combination of the MS data from various workflows (gels, 1D-LC, 2D-LC), instruments (TOF/TOF, trap, qTOF or FTMS), and search engines. We describe a database-driven study that combines two workflows, two mass spectrometers, and four search engines with protein identification following a decoy database strategy. The sample was a tryptically digested lysate (10,000 cells) of a human colorectal cancer cell line. Data from two LC-MALDI-TOF/TOF runs and a 2D-LC-ESI-trap run using capillary and nano-LC columns were submitted to the proteomics software platform ProteinScape. The combined MALDI data and the ESI data were searched using Mascot (Matrix Science), Phenyx (GeneBio), ProteinSolver (Bruker and Protagen), and Sequest (Thermo) against a decoy database generated from IPI-human in order to obtain one protein list across all workflows and search engines at a defined maximum false-positive rate of 5%. ProteinScape combined the data to one LC-MALDI and one LC-ESI dataset. The initial separate searches from the two combined datasets generated eight independent peptide lists. These were compiled into an integrated protein list using the ProteinExtractor algorithm. An initial evaluation of the generated data led to the identification of approximately 1200 proteins. Result integration on a peptide level allowed discrimination of protein isoforms that would not have been possible with a mere combination of protein lists.

  10. Comprehensive proteomic analysis of Penicillium verrucosum.

    PubMed

    Nöbauer, Katharina; Hummel, Karin; Mayrhofer, Corina; Ahrens, Maike; Setyabudi, Francis M C; Schmidt-Heydt, Markus; Eisenacher, Martin; Razzazi-Fazeli, Ebrahim

    2017-05-01

    Mass spectrometric identification of proteins in species lacking validated sequence information is a major problem in veterinary science. In the present study, we used ochratoxin A producing Penicillium verrucosum to identify and quantitatively analyze proteins of an organism with yet no protein information available. The work presented here aimed to provide a comprehensive protein identification of P. verrucosum using shotgun proteomics. We were able to identify 3631 proteins in an "ab initio" translated database from DNA sequences of P. verrucosum. Additionally, a sequential window acquisition of all theoretical fragment-ion spectra analysis was done to find differentially regulated proteins at two different time points of the growth curve. We compared the proteins at the beginning (day 3) and at the end of the log phase (day 12). © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. The role of targeted chemical proteomics in pharmacology

    PubMed Central

    Sutton, Chris W

    2012-01-01

    Traditionally, proteomics is the high-throughput characterization of the global complement of proteins in a biological system using cutting-edge technologies (robotics and mass spectrometry) and bioinformatics tools (Internet-based search engines and databases). As the field of proteomics has matured, a diverse range of strategies have evolved to answer specific problems. Chemical proteomics is one such direction that provides the means to enrich and detect less abundant proteins (the ‘hidden’ proteome) from complex mixtures of wide dynamic range (the ‘deep’ proteome). In pharmacology, chemical proteomics has been utilized to determine the specificity of drugs and their analogues, for anticipated known targets, only to discover other proteins that bind and could account for side effects observed in preclinical and clinical trials. As a consequence, chemical proteomics provides a valuable accessory in refinement of second- and third-generation drug design for treatment of many diseases. However, determining definitive affinity capture of proteins by a drug immobilized on soft gel chromatography matrices has highlighted some of the challenges that remain to be addressed. Examples of the different strategies that have emerged using well-established drugs against pharmaceutically important enzymes, such as protein kinases, metalloproteases, PDEs, cytochrome P450s, etc., indicate the potential opportunity to employ chemical proteomics as an early-stage screening approach in the identification of new targets. PMID:22074351

  12. sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments.

    PubMed

    Wen, Bo; Xu, Shaohang; Sheynkman, Gloria M; Feng, Qiang; Lin, Liang; Wang, Quanhui; Xu, Xun; Wang, Jun; Liu, Siqi

    2014-11-01

    Single nucleotide variations (SNVs) located within a reading frame can result in single amino acid polymorphisms (SAPs), leading to alteration of the corresponding amino acid sequence as well as function of a protein. Accurate detection of SAPs is an important issue in proteomic analysis at the experimental and bioinformatic level. Herein, we present sapFinder, an R software package, for detection of the variant peptides based on tandem mass spectrometry (MS/MS)-based proteomics data. This package automates the construction of variation-associated databases from public SNV repositories or sample-specific next-generation sequencing (NGS) data and the identification of SAPs through database searching, post-processing and generation of HTML-based report with visualized interface. sapFinder is implemented as a Bioconductor package in R. The package and the vignette can be downloaded at http://bioconductor.org/packages/devel/bioc/html/sapFinder.html and are provided under a GPL-2 license. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Andromeda: a peptide search engine integrated into the MaxQuant environment.

    PubMed

    Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias

    2011-04-01

    A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.

  14. Comprehensive Identification of Proteins from MALDI Imaging*

    PubMed Central

    Maier, Stefan K.; Hahne, Hannes; Gholami, Amin Moghaddas; Balluff, Benjamin; Meding, Stephan; Schoene, Cédrik; Walch, Axel K.; Kuster, Bernhard

    2013-01-01

    Matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) is a powerful tool for the visualization of proteins in tissues and has demonstrated considerable diagnostic and prognostic value. One main challenge is that the molecular identity of such potential biomarkers mostly remains unknown. We introduce a generic method that removes this issue by systematically identifying the proteins embedded in the MALDI matrix using a combination of bottom-up and top-down proteomics. The analyses of ten human tissues lead to the identification of 1400 abundant and soluble proteins constituting the set of proteins detectable by MALDI IMS including >90% of all IMS biomarkers reported in the literature. Top-down analysis of the matrix proteome identified 124 mostly N- and C-terminally fragmented proteins indicating considerable protein processing activity in tissues. All protein identification data from this study as well as the IMS literature has been deposited into MaTisse, a new publically available database, which we anticipate will become a valuable resource for the IMS community. PMID:23782541

  15. Detection of Missing Proteins Using the PRIDE Database as a Source of Mass Spectrometry Evidence.

    PubMed

    Garin-Muga, Alba; Odriozola, Leticia; Martínez-Val, Ana; Del Toro, Noemí; Martínez, Rocío; Molina, Manuela; Cantero, Laura; Rivera, Rocío; Garrido, Nicolás; Dominguez, Francisco; Sanchez Del Pino, Manuel M; Vizcaíno, Juan Antonio; Corrales, Fernando J; Segura, Victor

    2016-11-04

    The current catalogue of the human proteome is not yet complete, as experimental proteomics evidence is still elusive for a group of proteins known as the missing proteins. The Human Proteome Project (HPP) has been successfully using technology and bioinformatic resources to improve the characterization of such challenging proteins. In this manuscript, we propose a pipeline starting with the mining of the PRIDE database to select a group of data sets potentially enriched in missing proteins that are subsequently analyzed for protein identification with a method based on the statistical analysis of proteotypic peptides. Spermatozoa and the HEK293 cell line were found to be a promising source of missing proteins and clearly merit further attention in future studies. After the analysis of the selected samples, we found 342 PSMs, suggesting the presence of 97 missing proteins in human spermatozoa or the HEK293 cell line, while only 36 missing proteins were potentially detected in the retina, frontal cortex, aorta thoracica, or placenta. The functional analysis of the missing proteins detected confirmed their tissue specificity, and the validation of a selected set of peptides using targeted proteomics (SRM/MRM assays) further supports the utility of the proposed pipeline. As illustrative examples, DNAH3 and TEPP in spermatozoa, and UNCX and ATAD3C in HEK293 cells were some of the more robust and remarkable identifications in this study. We provide evidence indicating the relevance to carefully analyze the ever-increasing MS/MS data available from PRIDE and other repositories as sources for missing proteins detection in specific biological matrices as revealed for HEK293 cells.

  16. Detection of Missing Proteins Using the PRIDE Database as a Source of Mass Spectrometry Evidence

    PubMed Central

    2016-01-01

    The current catalogue of the human proteome is not yet complete, as experimental proteomics evidence is still elusive for a group of proteins known as the missing proteins. The Human Proteome Project (HPP) has been successfully using technology and bioinformatic resources to improve the characterization of such challenging proteins. In this manuscript, we propose a pipeline starting with the mining of the PRIDE database to select a group of data sets potentially enriched in missing proteins that are subsequently analyzed for protein identification with a method based on the statistical analysis of proteotypic peptides. Spermatozoa and the HEK293 cell line were found to be a promising source of missing proteins and clearly merit further attention in future studies. After the analysis of the selected samples, we found 342 PSMs, suggesting the presence of 97 missing proteins in human spermatozoa or the HEK293 cell line, while only 36 missing proteins were potentially detected in the retina, frontal cortex, aorta thoracica, or placenta. The functional analysis of the missing proteins detected confirmed their tissue specificity, and the validation of a selected set of peptides using targeted proteomics (SRM/MRM assays) further supports the utility of the proposed pipeline. As illustrative examples, DNAH3 and TEPP in spermatozoa, and UNCX and ATAD3C in HEK293 cells were some of the more robust and remarkable identifications in this study. We provide evidence indicating the relevance to carefully analyze the ever-increasing MS/MS data available from PRIDE and other repositories as sources for missing proteins detection in specific biological matrices as revealed for HEK293 cells. PMID:27581094

  17. MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data

    PubMed Central

    Hartler, Jürgen; Thallinger, Gerhard G; Stocker, Gernot; Sturn, Alexander; Burkard, Thomas R; Körner, Erik; Rader, Robert; Schmidt, Andreas; Mechtler, Karl; Trajanoski, Zlatko

    2007-01-01

    Background The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches. Results We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at Conclusion Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community. PMID:17567892

  18. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

    PubMed Central

    Schlüter, Agatha; Fourcade, Stéphane; Domènech-Estévez, Enric; Gabaldón, Toni; Huerta-Cepas, Jaime; Berthommier, Guillaume; Ripp, Raymond; Wanders, Ronald J. A.; Poch, Olivier; Pujol, Aurora

    2007-01-01

    Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database () that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections ‘Genes’, ‘Functions’, ‘Metabolic pathways’ and ‘Diseases’, that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle. PMID:17135190

  19. Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy.

    PubMed

    Elguoshy, Amr; Hirao, Yoshitoshi; Xu, Bo; Saito, Suguru; Quadery, Ali F; Yamamoto, Keiko; Mitsui, Toshiaki; Yamamoto, Tadashi

    2017-12-01

    In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10 -4 )% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.

  20. Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice.

    PubMed

    Li, Feng; Li, Maoyu; Xiao, Zhiqiang; Zhang, Pengfei; Li, Jianling; Chen, Zhuchu

    2006-01-11

    Many proteomics initiatives require integration of all information with uniformcriteria from collection of samples and data display to publication of experimental results. The integration and exchanging of these data of different formats and structure imposes a great challenge to us. The XML technology presents a promise in handling this task due to its simplicity and flexibility. Nasopharyngeal carcinoma (NPC) is one of the most common cancers in southern China and Southeast Asia, which has marked geographic and racial differences in incidence. Although there are some cancer proteome databases now, there is still no NPC proteome database. The raw NPC proteome experiment data were captured into one XML document with Human Proteome Markup Language (HUP-ML) editor and imported into native XML database Xindice. The 2D/MS repository of NPC proteome was constructed with Apache, PHP and Xindice to provide access to the database via Internet. On our website, two methods, keyword query and click query, were provided at the same time to access the entries of the NPC proteome database. Our 2D/MS repository can be used to share the raw NPC proteomics data that are generated from gel-based proteomics experiments. The database, as well as the PHP source codes for constructing users' own proteome repository, can be accessed at http://www.xyproteomics.org/.

  1. Proteomic analysis of tardigrades: towards a better understanding of molecular mechanisms by anhydrobiotic organisms.

    PubMed

    Schokraie, Elham; Hotz-Wagenblatt, Agnes; Warnken, Uwe; Mali, Brahim; Frohme, Marcus; Förster, Frank; Dandekar, Thomas; Hengherr, Steffen; Schill, Ralph O; Schnölzer, Martina

    2010-03-03

    Tardigrades are small, multicellular invertebrates which are able to survive times of unfavourable environmental conditions using their well-known capability to undergo cryptobiosis at any stage of their life cycle. Milnesium tardigradum has become a powerful model system for the analysis of cryptobiosis. While some genetic information is already available for Milnesium tardigradum the proteome is still to be discovered. Here we present to the best of our knowledge the first comprehensive study of Milnesium tardigradum on the protein level. To establish a proteome reference map we developed optimized protocols for protein extraction from tardigrades in the active state and for separation of proteins by high resolution two-dimensional gel electrophoresis. Since only limited sequence information of M. tardigradum on the genome and gene expression level is available to date in public databases we initiated in parallel a tardigrade EST sequencing project to allow for protein identification by electrospray ionization tandem mass spectrometry. 271 out of 606 analyzed protein spots could be identified by searching against the publicly available NCBInr database as well as our newly established tardigrade protein database corresponding to 144 unique proteins. Another 150 spots could be identified in the tardigrade clustered EST database corresponding to 36 unique contigs and ESTs. Proteins with annotated function were further categorized in more detail by their molecular function, biological process and cellular component. For the proteins of unknown function more information could be obtained by performing a protein domain annotation analysis. Our results include proteins like protein member of different heat shock protein families and LEA group 3, which might play important roles in surviving extreme conditions. The proteome reference map of Milnesium tardigradum provides the basis for further studies in order to identify and characterize the biochemical mechanisms of tolerance to extreme desiccation. The optimized proteomics workflow will enable application of sensitive quantification techniques to detect differences in protein expression, which are characteristic of the active and anhydrobiotic states of tardigrades.

  2. Proteomic Analysis of Tardigrades: Towards a Better Understanding of Molecular Mechanisms by Anhydrobiotic Organisms

    PubMed Central

    Schokraie, Elham; Hotz-Wagenblatt, Agnes; Warnken, Uwe; Mali, Brahim; Frohme, Marcus; Förster, Frank; Dandekar, Thomas; Hengherr, Steffen; Schill, Ralph O.; Schnölzer, Martina

    2010-01-01

    Background Tardigrades are small, multicellular invertebrates which are able to survive times of unfavourable environmental conditions using their well-known capability to undergo cryptobiosis at any stage of their life cycle. Milnesium tardigradum has become a powerful model system for the analysis of cryptobiosis. While some genetic information is already available for Milnesium tardigradum the proteome is still to be discovered. Principal Findings Here we present to the best of our knowledge the first comprehensive study of Milnesium tardigradum on the protein level. To establish a proteome reference map we developed optimized protocols for protein extraction from tardigrades in the active state and for separation of proteins by high resolution two-dimensional gel electrophoresis. Since only limited sequence information of M. tardigradum on the genome and gene expression level is available to date in public databases we initiated in parallel a tardigrade EST sequencing project to allow for protein identification by electrospray ionization tandem mass spectrometry. 271 out of 606 analyzed protein spots could be identified by searching against the publicly available NCBInr database as well as our newly established tardigrade protein database corresponding to 144 unique proteins. Another 150 spots could be identified in the tardigrade clustered EST database corresponding to 36 unique contigs and ESTs. Proteins with annotated function were further categorized in more detail by their molecular function, biological process and cellular component. For the proteins of unknown function more information could be obtained by performing a protein domain annotation analysis. Our results include proteins like protein member of different heat shock protein families and LEA group 3, which might play important roles in surviving extreme conditions. Conclusions The proteome reference map of Milnesium tardigradum provides the basis for further studies in order to identify and characterize the biochemical mechanisms of tolerance to extreme desiccation. The optimized proteomics workflow will enable application of sensitive quantification techniques to detect differences in protein expression, which are characteristic of the active and anhydrobiotic states of tardigrades. PMID:20224743

  3. Proteomic Screening of Antigenic Proteins from the Hard Tick, Haemaphysalis longicornis (Acari: Ixodidae)

    PubMed Central

    Kim, Young-Ha; slam, Mohammad Saiful; You, Myung-Jo

    2015-01-01

    Proteomic tools allow large-scale, high-throughput analyses for the detection, identification, and functional investigation of proteome. For detection of antigens from Haemaphysalis longicornis, 1-dimensional electrophoresis (1-DE) quantitative immunoblotting technique combined with 2-dimensional electrophoresis (2-DE) immunoblotting was used for whole body proteins from unfed and partially fed female ticks. Reactivity bands and 2-DE immunoblotting were performed following 2-DE electrophoresis to identify protein spots. The proteome of the partially fed female had a larger number of lower molecular weight proteins than that of the unfed female tick. The total number of detected spots was 818 for unfed and 670 for partially fed female ticks. The 2-DE immunoblotting identified 10 antigenic spots from unfed females and 8 antigenic spots from partially fed females. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF) of relevant spots identified calreticulin, putative secreted WC salivary protein, and a conserved hypothetical protein from the National Center for Biotechnology Information and Swiss Prot protein sequence databases. These findings indicate that most of the whole body components of these ticks are non-immunogenic. The data reported here will provide guidance in the identification of antigenic proteins to prevent infestation and diseases transmitted by H. longicornis. PMID:25748713

  4. Peptide de novo sequencing of mixture tandem mass spectra

    PubMed Central

    Hotta, Stéphanie Yuki Kolbeck; Verano‐Braga, Thiago; Kjeldsen, Frank

    2016-01-01

    The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications. PMID:27329701

  5. Guidelines for reporting quantitative mass spectrometry based experiments in proteomics.

    PubMed

    Martínez-Bartolomé, Salvador; Deutsch, Eric W; Binz, Pierre-Alain; Jones, Andrew R; Eisenacher, Martin; Mayer, Gerhard; Campos, Alex; Canals, Francesc; Bech-Serra, Joan-Josep; Carrascal, Montserrat; Gay, Marina; Paradela, Alberto; Navajas, Rosana; Marcilla, Miguel; Hernáez, María Luisa; Gutiérrez-Blázquez, María Dolores; Velarde, Luis Felipe Clemente; Aloria, Kerman; Beaskoetxea, Jabier; Medina-Aunon, J Alberto; Albar, Juan P

    2013-12-16

    Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows. The guidelines for Mass Spectrometry Quantification allow the description of a wide range of quantitative approaches, including labeled and label-free techniques and also targeted approaches such as Selected Reaction Monitoring (SRM). The HUPO Proteomics Standards Initiative (HUPO-PSI) has invested considerable efforts to improve the standardization of proteomics data handling, representation and sharing through the development of data standards, reporting guidelines, controlled vocabularies and tooling. In this manuscript, we describe a key output from the HUPO-PSI-namely the MIAPE Quant guidelines, which have developed in parallel with the corresponding data exchange format mzQuantML [1]. The MIAPE Quant guidelines describe the HUPO-PSI proposal concerning the minimum information to be reported when a quantitative data set, derived from mass spectrometry (MS), is submitted to a database or as supplementary information to a journal. The guidelines have been developed with input from a broad spectrum of stakeholders in the proteomics field to represent a true consensus view of the most important data types and metadata, required for a quantitative experiment to be analyzed critically or a data analysis pipeline to be reproduced. It is anticipated that they will influence or be directly adopted as part of journal guidelines for publication and by public proteomics databases and thus may have an impact on proteomics laboratories across the world. This article is part of a Special Issue entitled: Standardization and Quality Control. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Uridine monophosphate kinase as potential target for tuberculosis: from target to lead identification.

    PubMed

    Arvind, Akanksha; Jain, Vaibhav; Saravanan, Parameswaran; Mohan, C Gopi

    2013-12-01

    Mycobacterium tuberculosis (Mtb) is a causative agent of tuberculosis (TB) disease, which has affected approximately 2 billion people worldwide. Due to the emergence of resistance towards the existing drugs, discovery of new anti-TB drugs is an important global healthcare challenge. To address this problem, there is an urgent need to identify new drug targets in Mtb. In the present study, the subtractive genomics approach has been employed for the identification of new drug targets against TB. Screening the Mtb proteome using the Database of Essential Genes (DEG) and human proteome resulted in the identification of 60 key proteins which have no eukaryotic counterparts. Critical analysis of these proteins using Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways database revealed uridine monophosphate kinase (UMPK) enzyme as a potential drug target for developing novel anti-TB drugs. Homology model of Mtb-UMPK was constructed for the first time on the basis of the crystal structure of E. coli-UMPK, in order to understand its structure-function relationships, and which would in turn facilitate to perform structure-based inhibitor design. Furthermore, the structural similarity search was carried out using physiological inhibitor UTP of Mtb-UMPK to virtually screen ZINC database. Retrieved hits were further screened by implementing several filters like ADME and toxicity followed by molecular docking. Finally, on the basis of the Glide docking score and the mode of binding, 6 putative leads were identified as inhibitors of this enzyme which can potentially emerge as future drugs for the treatment of TB.

  7. Identification of species- and tissue-specific proteins using proteomic strategy

    NASA Astrophysics Data System (ADS)

    Chernukha, I. M.; Vostrikova, N. L.; Kovalev, L. I.; Shishkin, S. S.; Kovaleva, M. A.; Manukhin, Y. S.

    2017-09-01

    Proteomic technologies have proven to be very effective for detecting biochemical changes in meat products, such as changes in tissue- and species-specific proteins. In the tissues of cattle, pig, horse and camel M. longissimus dorsi both tissue- and species specific proteins were detected using two dimensional electrophoresis. Species-specific isoforms of several muscle proteins were also identified. The identified and described proteins of cattle, pig, horse and camel skeletal muscles (including mass spectra of the tryptic peptides) were added to the national free access database “Muscle organ proteomics”. This research has enabled the development of new highly sensitive technologies for meat product quality control against food fraud.

  8. Identification of novel peptides for horse meat speciation in highly processed foodstuffs.

    PubMed

    Claydon, Amy J; Grundy, Helen H; Charlton, Adrian J; Romero, M Rosario

    2015-01-01

    There is a need for robust analytical methods to support enforcement of food labelling legislation. Proteomics is emerging as a complementary methodology to existing tools such as DNA and antibody-based techniques. Here we describe the development of a proteomics strategy for the determination of meat species in highly processed foods. A database of specific peptides for nine relevant animal species was used to enable semi-targeted species determination. This principle was tested for horse meat speciation, and a range of horse-specific peptides were identified as heat stable marker peptides for the detection of low levels of horse meat in mixtures with other species.

  9. Quantitative Proteomics Identifies Activation of Hallmark Pathways of Cancer in Patient Melanoma.

    PubMed

    Byrum, Stephanie D; Larson, Signe K; Avaritt, Nathan L; Moreland, Linley E; Mackintosh, Samuel G; Cheung, Wang L; Tackett, Alan J

    2013-03-01

    Molecular pathways regulating melanoma initiation and progression are potential targets of therapeutic development for this aggressive cancer. Identification and molecular analysis of these pathways in patients has been primarily restricted to targeted studies on individual proteins. Here, we report the most comprehensive analysis of formalin-fixed paraffin-embedded human melanoma tissues using quantitative proteomics. From 61 patient samples, we identified 171 proteins varying in abundance among benign nevi, primary melanoma, and metastatic melanoma. Seventy-three percent of these proteins were validated by immunohistochemistry staining of malignant melanoma tissues from the Human Protein Atlas database. Our results reveal that molecular pathways involved with tumor cell proliferation, motility, and apoptosis are mis-regulated in melanoma. These data provide the most comprehensive proteome resource on patient melanoma and reveal insight into the molecular mechanisms driving melanoma progression.

  10. Megadalton Complexes in the Chloroplast Stroma of Arabidopsis thaliana Characterized by Size Exclusion Chromatography, Mass Spectrometry, and Hierarchical Clustering*

    PubMed Central

    Olinares, Paul Dominic B.; Ponnala, Lalit; van Wijk, Klaas J.

    2010-01-01

    To characterize MDa-sized macromolecular chloroplast stroma protein assemblies and to extend coverage of the chloroplast stroma proteome, we fractionated soluble chloroplast stroma in the non-denatured state by size exclusion chromatography with a size separation range up to ∼5 MDa. To maximize protein complex stability and resolution of megadalton complexes, ionic strength and composition were optimized. Subsequent high accuracy tandem mass spectrometry analysis (LTQ-Orbitrap) identified 1081 proteins across the complete native mass range. Protein complexes and assembly states above 0.8 MDa were resolved using hierarchical clustering, and protein heat maps were generated from normalized protein spectral counts for each of the size exclusion chromatography fractions; this complemented previous analysis of stromal complexes up to 0.8 MDa (Peltier, J. B., Cai, Y., Sun, Q., Zabrouskov, V., Giacomelli, L., Rudella, A., Ytterberg, A. J., Rutschow, H., and van Wijk, K. J. (2006) The oligomeric stromal proteome of Arabidopsis thaliana chloroplasts. Mol. Cell. Proteomics 5, 114–133). This combined experimental and bioinformatics analyses resolved chloroplast ribosomes in different assembly and functional states (e.g. 30, 50, and 70 S), which enabled the identification of plastid homologues of prokaryotic ribosome assembly factors as well as proteins involved in co-translational modifications, targeting, and folding. The roles of these ribosome-associating proteins will be discussed. Known RNA splice factors (e.g. CAF1/WTF1/RNC1) as well as uncharacterized proteins with RNA-binding domains (pentatricopeptide repeat, RNA recognition motif, and chloroplast ribosome maturation), RNases, and DEAD box helicases were found in various sized complexes. Chloroplast DNA (>3 MDa) was found in association with the complete heteromeric plastid-encoded DNA polymerase complex, and a dozen other DNA-binding proteins, e.g. DNA gyrase, topoisomerase, and various DNA repair enzymes. The heteromeric ≥5-MDa pyruvate dehydrogenase complex and the 0.8–1-MDa acetyl-CoA carboxylase complex associated with uncharacterized biotin carboxyl carrier domain proteins constitute the entry point to fatty acid metabolism in leaves; we suggest that their large size relates to the need for metabolic channeling. Protein annotations and identification data are available through the Plant Proteomics Database, and mass spectrometry data are available through Proteomics Identifications database. PMID:20423899

  11. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets

    PubMed Central

    Griss, Johannes; Perez-Riverol, Yasset; Lewis, Steve; Tabb, David L.; Dianes, José A.; del-Toro, Noemi; Rurik, Marc; Walzer, Mathias W.; Kohlbacher, Oliver; Hermjakob, Henning; Wang, Rui; Vizcaíno, Juan Antonio

    2016-01-01

    Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra. PMID:27493588

  12. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets.

    PubMed

    Griss, Johannes; Perez-Riverol, Yasset; Lewis, Steve; Tabb, David L; Dianes, José A; Del-Toro, Noemi; Rurik, Marc; Walzer, Mathias W; Kohlbacher, Oliver; Hermjakob, Henning; Wang, Rui; Vizcaíno, Juan Antonio

    2016-08-01

    Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.

  13. Using PeptideAtlas, SRMAtlas and PASSEL – Comprehensive Resources for discovery and targeted proteomics

    PubMed Central

    Kusebauch, Ulrike; Deutsch, Eric W.; Campbell, David S.; Sun, Zhi; Farrah, Terry; Moritz, Robert L.

    2014-01-01

    PeptideAtlas, SRMAtlas and PASSEL are web-accessible resources to support discovery and targeted proteomics research. PeptideAtlas is a multi-species compendium of shotgun proteomic data provided by the scientific community, SRMAtlas is a resource of high-quality, complete proteome SRM assays generated in a consistent manner for the targeted identification and quantification of proteins, and PASSEL is a repository that compiles and represents selected reaction monitoring data, all in an easy to use interface. The databases are generated from native mass spectrometry data files that are analyzed in a standardized manner including statistical validation of the results. Each resource offers search functionalities and can be queried by user defined constraints; the query results are provided in tables or are graphically displayed. PeptideAtlas, SRMAtlas and PASSEL are publicly available freely via the website http://www.peptideatlas.org. In this protocol, we describe the use of these resources, we highlight how to submit, search, collate and download data. PMID:24939129

  14. MALDI-TOF mass spectrometry proteomic phenotyping of clinically relevant fungi.

    PubMed

    Putignani, Lorenza; Del Chierico, Federica; Onori, Manuela; Mancinelli, Livia; Argentieri, Marta; Bernaschi, Paola; Coltella, Luana; Lucignano, Barbara; Pansani, Laura; Ranno, Stefania; Russo, Cristina; Urbani, Andrea; Federici, Giorgio; Menichella, Donato

    2011-03-01

    Proteomics is particularly suitable for characterising human pathogens with high life cycle complexity, such as fungi. Protein content and expression levels may be affected by growth states and life cycle morphs and correlate to species and strain variation. Identification and typing of fungi by conventional methods are often difficult, time-consuming and frequently, for unusual species, inconclusive. Proteomic phenotypes from MALDI-TOF MS were employed as analytical and typing expression profiling of yeast, yeast-like species and strain variants in order to achieve a microbial proteomics population study. Spectra from 303 clinical isolates were generated and processed by standard pattern matching with a MALDI-TOF Biotyper (MT). Identifications (IDs) were compared to a reference biochemical-based system (Vitek-2) and, when discordant, MT IDs were verified with genotyping IDs, obtained by sequencing the 25-28S rRNA hypervariable D2 region. Spectra were converted into virtual gel-like formats, and hierarchical clustering analysis was performed for 274 Candida profiles to investigate species and strain typing correlation. MT provided 257/303 IDs consistent with Vitek-2 ones. However, amongst 26/303 discordant MT IDs, only 5 appeared "true". No MT identification was achieved for 20/303 isolates for incompleteness of database species variants. Candida spectra clustering agreed with identified species and topology of Candida albicans and Candida parapsilosis specific dendrograms. MT IDs show a high analytical performance and profiling heterogeneity which seems to complement or even outclass existing typing tools. This variability reflects the high biological complexity of yeasts and may be properly exploited to provide epidemiological tracing and infection dispersion patterns.

  15. Exploiting genomic data to identify proteins involved in abalone reproduction.

    PubMed

    Mendoza-Porras, Omar; Botwright, Natasha A; McWilliam, Sean M; Cook, Mathew T; Harris, James O; Wijffels, Gene; Colgrave, Michelle L

    2014-08-28

    Aside from their critical role in reproduction, abalone gonads serve as an indicator of sexual maturity and energy balance, two key considerations for effective abalone culture. Temperate abalone farmers face issues with tank restocking with highly marketable abalone owing to inefficient spawning induction methods. The identification of key proteins in sexually mature abalone will serve as the foundation for a greater understanding of reproductive biology. Addressing this knowledge gap is the first step towards improving abalone aquaculture methods. Proteomic profiling of female and male gonads of greenlip abalone, Haliotis laevigata, was undertaken using liquid chromatography-mass spectrometry. Owing to the incomplete nature of abalone protein databases, in addition to searching against two publicly available databases, a custom database comprising genomic data was used. Overall, 162 and 110 proteins were identified in females and males respectively with 40 proteins common to both sexes. For proteins involved in sexual maturation, sperm and egg structure, motility, acrosomal reaction and fertilization, 23 were identified only in females, 18 only in males and 6 were common. Gene ontology analysis revealed clear differences between the female and male protein profiles reflecting a higher rate of protein synthesis in the ovary and higher metabolic activity in the testis. A comprehensive mass spectrometry-based analysis was performed to profile the abalone gonad proteome providing the foundation for future studies of reproduction in abalone. Key proteins involved in both reproduction and energy balance were identified. Genomic resources were utilised to build a database of molluscan proteins yielding >60% more protein identifications than in a standard workflow employing public protein databases. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Combining De Novo Peptide Sequencing Algorithms, A Synergistic Approach to Boost Both Identifications and Confidence in Bottom-up Proteomics.

    PubMed

    Blank-Landeshammer, Bernhard; Kollipara, Laxmikanth; Biß, Karsten; Pfenninger, Markus; Malchow, Sebastian; Shuvaev, Konstantin; Zahedi, René P; Sickmann, Albert

    2017-09-01

    Complex mass spectrometry based proteomics data sets are mostly analyzed by protein database searches. While this approach performs considerably well for sequenced organisms, direct inference of peptide sequences from tandem mass spectra, i.e., de novo peptide sequencing, oftentimes is the only way to obtain information when protein databases are absent. However, available algorithms suffer from drawbacks such as lack of validation and often high rates of false positive hits (FP). Here we present a simple method of combining results from commonly available de novo peptide sequencing algorithms, which in conjunction with minor tweaks in data acquisition ensues lower empirical FDR compared to the analysis using single algorithms. Results were validated using state-of-the art database search algorithms as well specifically synthesized reference peptides. Thus, we could increase the number of PSMs meeting a stringent FDR of 5% more than 3-fold compared to the single best de novo sequencing algorithm alone, accounting for an average of 11 120 PSMs (combined) instead of 3476 PSMs (alone) in triplicate 2 h LC-MS runs of tryptic HeLa digestion.

  17. Cloud parallel processing of tandem mass spectrometry based proteomics data.

    PubMed

    Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

    2012-10-05

    Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

  18. Proteomic profiling of mature leaves from oil palm (Elaeis guineensis Jacq.).

    PubMed

    Tan, Hooi Sin; Jacoby, Richard P; Ong-Abdullah, Meilina; Taylor, Nicolas L; Liddell, Susan; Chee, Wong Wei; Chin, Chiew Foan

    2017-04-01

    Oil palm is one of the most productive oil bearing crops grown in Southeast Asia. Due to the dwindling availability of agricultural land and increasing demand for high yielding oil palm seedlings, clonal propagation is vital to the oil palm industry. Most commonly, leaf explants are used for in vitro micropropagation of oil palm and to optimize this process it is important to unravel the physiological and molecular mechanisms underlying somatic embryo production from leaves. In this study, a proteomic approach was used to determine protein abundance of mature oil palm leaves. To do this, leaf proteins were extracted using TCA/acetone precipitation protocol and separated by 2DE. A total of 191 protein spots were observed on the 2D gels and 67 of the most abundant protein spots that were consistently observed were selected for further analysis with 35 successfully identified using MALDI TOF/TOF MS. The majority of proteins were classified as being involved in photosynthesis, metabolism, cellular biogenesis, stress response, and transport. This study provides the first proteomic assessment of oil palm leaves in this important oil crop and demonstrates the successful identification of selected proteins spots using the Malaysian Palm Oil Board (MPOB) Elaeis guineensis EST and NCBI-protein databases. The MS data have been deposited in the ProteomeXchange Consortium database with the data set identifier PXD001307. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Size-Sorting Combined with Improved Nanocapillary-LC-MS for Identification of Intact Proteins up to 80 kDa

    PubMed Central

    Vellaichamy, Adaikkalam; Tran, John C.; Catherman, Adam D.; Lee, Ji Eun; Kellie, John F.; Sweet, Steve M.M.; Zamdborg, Leonid; Thomas, Paul M.; Ahlf, Dorothy R.; Durbin, Kenneth R.; Valaskovic, Gary A.; Kelleher, Neil L.

    2010-01-01

    Despite the availability of ultra-high resolution mass spectrometers, methods for separation and detection of intact proteins for proteome-scale analyses are still in a developmental phase. Here we report robust protocols for on-line LC-MS to drive high-throughput top-down proteomics in a fashion similar to bottom-up. Comparative work on protein standards showed that a polymeric stationary phase led to superior sensitivity over a silica-based medium in reversed-phase nanocapillary-LC, with detection of proteins >50 kDa routinely accomplished in the linear ion trap of a hybrid Fourier-Transform mass spectrometer. Protein identification was enabled by nozzle-skimmer dissociation (NSD) and detection of fragment ions with <5 ppm mass accuracy for highly-specific database searching using custom software. This overall approach led to identification of proteins up to 80 kDa, with 10-60 proteins identified in single LC-MS runs of samples from yeast and human cell lines pre-fractionated by their molecular weight using a gel-based sieving system. PMID:20073486

  20. A Novel Proteomics Approach to Identify SUMOylated Proteins and Their Modification Sites in Human Cells*

    PubMed Central

    Galisson, Frederic; Mahrouche, Louiza; Courcelles, Mathieu; Bonneil, Eric; Meloche, Sylvain; Chelbi-Alix, Mounira K.; Thibault, Pierre

    2011-01-01

    The small ubiquitin-related modifier (SUMO) is a small group of proteins that are reversibly attached to protein substrates to modify their functions. The large scale identification of protein SUMOylation and their modification sites in mammalian cells represents a significant challenge because of the relatively small number of in vivo substrates and the dynamic nature of this modification. We report here a novel proteomics approach to selectively enrich and identify SUMO conjugates from human cells. We stably expressed different SUMO paralogs in HEK293 cells, each containing a His6 tag and a strategically located tryptic cleavage site at the C terminus to facilitate the recovery and identification of SUMOylated peptides by affinity enrichment and mass spectrometry. Tryptic peptides with short SUMO remnants offer significant advantages in large scale SUMOylome experiments including the generation of paralog-specific fragment ions following CID and ETD activation, and the identification of modified peptides using conventional database search engines such as Mascot. We identified 205 unique protein substrates together with 17 precise SUMOylation sites present in 12 SUMO protein conjugates including three new sites (Lys-380, Lys-400, and Lys-497) on the protein promyelocytic leukemia. Label-free quantitative proteomics analyses on purified nuclear extracts from untreated and arsenic trioxide-treated cells revealed that all identified SUMOylated sites of promyelocytic leukemia were differentially SUMOylated upon stimulation. PMID:21098080

  1. Proteomics study of extracellular fibrinolytic proteases from Bacillus licheniformis RO3 and Bacillus pumilus 2.g isolated from Indonesian fermented food

    NASA Astrophysics Data System (ADS)

    Nur Afifah, Diana; Rustanti, Ninik; Anjani, Gemala; Syah, Dahrul; Yanti; Suhartono, Maggy T.

    2017-02-01

    This paper presents the proteomics study which includes separation, identification and characterization of proteins. The experiment on Indonesian fermented food such as extracellular fibrinolytic protease from Bacillus licheniformis RO3 and Bacillus pumilus 2.g isolated from red oncom and tempeh gembus was conducted. The experimental works comprise the following steps: (1) a combination of one- and two-dimensional electrophoresis analysis, (2) mass spectrometry analysis using MALDI-TOF-MS and (3) investigation using protein database. The result suggested that there were new two protein fractions of B. licheniformis RO3 and three protein fractions of B. pumilus 2.g. These result has not been previously reported.

  2. Proteomic analysis of Pinus radiata needles: 2-DE map and protein identification by LC/MS/MS and substitution-tolerant database searching.

    PubMed

    Valledor, Luis; Castillejo, Maria A; Lenz, Christof; Rodríguez, Roberto; Cañal, Maria J; Jorrín, Jesús

    2008-07-01

    Pinus radiata is one of the most economically important forest tree species, with a worldwide production of around 370 million m (3) of wood per year. Current selection of elite trees to be used in conservation and breeding programes requires the physiological and molecular characterization of available populations. To identify key proteins related to tree growth, productivity and responses to environmental factors, a proteomic approach is being utilized. In this paper, we present the first report of the 2-DE protein reference map of physiologically mature P. radiata needles, as a basis for subsequent differential expression proteomic studies related to growth, development, biomass production and responses to stresses. After TCA/acetone protein extraction of needle tissue, 549 +/- 21 well-resolved spots were detected in Coommassie-stained gels within the 5-8 pH and 10-100 kDa M(r) ranges. The analytical and biological variance determined for 450 spots were of 31 and 42%, respectively. After LC/MS/MS analysis of in-gel tryptic digested spots, proteins were identified by using the novel Paragon algorithm that tolerates amino acid substitution in the first-pass search. It allowed the confident identification of 115 out of the 150 protein spots subjected to MS, quite unusual high percentage for a poor sequence database, as is the case of P. radiata. Proteins were classified into 12 or 18 groups based on their corresponding cell component or biological process/pathway categories, respectively. Carbohydrate metabolism and photosynthetic enzymes predominate in the 2-DE protein profile of P. radiata needles.

  3. Peptide de novo sequencing of mixture tandem mass spectra.

    PubMed

    Gorshkov, Vladimir; Hotta, Stéphanie Yuki Kolbeck; Verano-Braga, Thiago; Kjeldsen, Frank

    2016-09-01

    The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co-isolation and thus prone to false identifications. The deconvolution approach matched complementary b-, y-ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co-isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20-35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Content Is King: Databases Preserve the Collective Information of Science.

    PubMed

    Yates, John R

    2018-04-01

    Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.

  5. The Proteome of Seed Development in the Model Legume Lotus japonicus1[C][W

    PubMed Central

    Dam, Svend; Laursen, Brian S.; Ørnfelt, Jane H.; Jochimsen, Bjarne; Stærfeldt, Hans Henrik; Friis, Carsten; Nielsen, Kasper; Goffard, Nicolas; Besenbacher, Søren; Krusell, Lene; Sato, Shusei; Tabata, Satoshi; Thøgersen, Ida B.; Enghild, Jan J.; Stougaard, Jens

    2009-01-01

    We have characterized the development of seeds in the model legume Lotus japonicus. Like soybean (Glycine max) and pea (Pisum sativum), Lotus develops straight seed pods and each pod contains approximately 20 seeds that reach maturity within 40 days. Histological sections show the characteristic three developmental phases of legume seeds and the presence of embryo, endosperm, and seed coat in desiccated seeds. Furthermore, protein, oil, starch, phytic acid, and ash contents were determined, and this indicates that the composition of mature Lotus seed is more similar to soybean than to pea. In a first attempt to determine the seed proteome, both a two-dimensional polyacrylamide gel electrophoresis approach and a gel-based liquid chromatography-mass spectrometry approach were used. Globulins were analyzed by two-dimensional polyacrylamide gel electrophoresis, and five legumins, LLP1 to LLP5, and two convicilins, LCP1 and LCP2, were identified by matrix-assisted laser desorption ionization quadrupole/time-of-flight mass spectrometry. For two distinct developmental phases, seed filling and desiccation, a gel-based liquid chromatography-mass spectrometry approach was used, and 665 and 181 unique proteins corresponding to gene accession numbers were identified for the two phases, respectively. All of the proteome data, including the experimental data and mass spectrometry spectra peaks, were collected in a database that is available to the scientific community via a Web interface (http://www.cbs.dtu.dk/cgi-bin/lotus/db.cgi). This database establishes the basis for relating physiology, biochemistry, and regulation of seed development in Lotus. Together with a new Web interface (http://bioinfoserver.rsbs.anu.edu.au/utils/PathExpress4legumes/) collecting all protein identifications for Lotus, Medicago, and soybean seed proteomes, this database is a valuable resource for comparative seed proteomics and pathway analysis within and beyond the legume family. PMID:19129418

  6. Two-dimensional electrophoretic profiling of normal human kidney glomerulus proteome and construction of an extensible markup language (XML)-based database.

    PubMed

    Yoshida, Yutaka; Miyazaki, Kenji; Kamiie, Junichi; Sato, Masao; Okuizumi, Seiji; Kenmochi, Akihisa; Kamijo, Ken'ichi; Nabetani, Takuji; Tsugita, Akira; Xu, Bo; Zhang, Ying; Yaoita, Eishin; Osawa, Tetsuo; Yamamoto, Tadashi

    2005-03-01

    To contribute to physiology and pathophysiology of the glomerulus of human kidney, we have launched a proteomic study of human glomerulus, and compiled a profile of proteins expressed in the glomerulus of normal human kidney by two-dimensional gel electrophoresis (2-DE) and identification with matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) and/or liquid chromatography-tandem mass spectrometry (LC-MS/MS). Kidney cortices with normal appearance were obtained from patients under surgical nephrectomy due to renal tumor, and glomeruli were highly purified by a standard sieving method followed by picking-up under a phase-contrast microscope. The glomerular proteins were separated by 2-DE with 24 cm immobilized pH gradient strips in the 3-10 range in the first dimension and 26 x 20 cm sodium dodecyl sulfate polyacrylamide electrophoresis gels of 12.5% in the second dimension. Gels were silver-stained, and valid spots were processed for identification through an integrated robotic system that consisted of a spot picker, an in-gel digester, and a MALDI-TOF MS and / or a LC-MS/MS. From 2-DE gel images of glomeruli of four subjects with no apparent pathologic manifestations, a synthetic gel image of normal glomerular proteins was created. The synthetic gel image contained 1713 valid spots, of which 1559 spots were commonly observed in the respective 2-DE gels. Among the 1559 spots, 347 protein spots, representing 212 proteins, have so far been identified, and used for the construction of an extensible markup language (XML)-based database. The database is deposited on a web site (http://www.sw.nec.co.jp/bio/rd/hgldb/index.html) in a form accessible to researchers to contribute to proteomic studies of human glomerulus in health and disease.

  7. Making proteomics data accessible and reusable: Current state of proteomics databases and repositories

    PubMed Central

    Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-01-01

    Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. PMID:25158685

  8. Cardiovascular Redox and Ox Stress Proteomics

    PubMed Central

    Kumar, Vikas; Calamaras, Timothy Dean; Haeussler, Dagmar; Colucci, Wilson Steven; Cohen, Richard Alan; McComb, Mark Errol; Pimentel, David

    2012-01-01

    Abstract Significance: Oxidative post-translational modifications (OPTMs) have been demonstrated as contributing to cardiovascular physiology and pathophysiology. These modifications have been identified using antibodies as well as advanced proteomic methods, and the functional importance of each is beginning to be understood using transgenic and gene deletion animal models. Given that OPTMs are involved in cardiovascular pathology, the use of these modifications as biomarkers and predictors of disease has significant therapeutic potential. Adequate understanding of the chemistry of the OPTMs is necessary to determine what may occur in vivo and which modifications would best serve as biomarkers. Recent Advances: By using mass spectrometry, advanced labeling techniques, and antibody identification, OPTMs have become accessible to a larger proportion of the scientific community. Advancements in instrumentation, database search algorithms, and processing speed have allowed MS to fully expand on the proteome of OPTMs. In addition, the role of enzymatically reversible OPTMs has been further clarified in preclinical models. Critical Issues: The identification of OPTMs suffers from limitations in analytic detection based on the methodology, instrumentation, sample complexity, and bioinformatics. Currently, each type of OPTM requires a specific strategy for identification, and generalized approaches result in an incomplete assessment. Future Directions: Novel types of highly sensitive MS instrumentation that allow for improved separation and detection of modified proteins and peptides have been crucial in the discovery of OPTMs and biomarkers. To further advance the identification of relevant OPTMs in advanced search algorithms, standardized methods for sample processing and depository of MS data will be required. Antioxid. Redox Signal. 17, 1528–1559. PMID:22607061

  9. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  10. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  11. Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets.

    PubMed

    Uddin, Reaz; Sufian, Muhammad

    2016-01-01

    Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host. We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen. The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins of the unique metabolic pathways of the pathogens and mining the proteomic data of all completely sequenced strains of the pathogen, thus improving the quality and application of the results. We believe that the sharing of the knowledge from this study would eventually lead to bring about novel and unique therapeutic regimens against the infections caused by the S. enterica.

  12. Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

    PubMed

    Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-03-01

    Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. © 2014 The Authors. PROTEOMICS published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Identification of serum biomarkers in dogs naturally infected with Babesia canis canis using a proteomic approach

    PubMed Central

    2014-01-01

    Background Canine babesiosis is a tick-borne disease that is caused by the haemoprotozoan parasites of the genus Babesia. There are limited data on serum proteomics in dogs, and none of the effect of babesiosis on the serum proteome. The aim of this study was to identify the potential serum biomarkers of babesiosis using proteomic techniques in order to increase our understanding about disease pathogenesis. Results Serum samples were collected from 25 dogs of various breeds and sex with naturally occurring babesiosis caused by B. canis canis. Blood was collected on the day of admission (day 0), and subsequently on the 1st and 6th day of treatment. Two-dimensional electrophoresis (2DE) of pooled serum samples of dogs with naturally occurring babesiosis (day 0, day 1 and day 6) and healthy dogs were run in triplicate. 2DE image analysis showed 64 differentially expressed spots with p ≤ 0.05 and 49 spots with fold change ≥2. Six selected spots were excised manually and subjected to trypsin digest prior to identification by electrospray ionisation mass spectrometry on an Amazon ion trap tandem mass spectrometry (MS/MS). Mass spectrometry data was processed using Data Analysis software and the automated Matrix Science Mascot Daemon server. Protein identifications were assigned using the Mascot search engine to interrogate protein sequences in the NCBI Genbank database. A number of differentially expressed serum proteins involved in inflammation mediated acute phase response, complement and coagulation cascades, apolipoproteins and vitamin D metabolism pathway were identified in dogs with babesiosis. Conclusions Our findings confirmed two dominant pathogenic mechanisms of babesiosis, haemolysis and acute phase response. These results may provide possible serum biomarker candidates for clinical monitoring of babesiosis and this study could serve as the basis for further proteomic investigations in canine babesiosis. PMID:24885808

  14. Data Independent Acquisition analysis in ProHits 4.0.

    PubMed

    Liu, Guomin; Knight, James D R; Zhang, Jian Ping; Tsou, Chih-Chiang; Wang, Jian; Lambert, Jean-Philippe; Larsen, Brett; Tyers, Mike; Raught, Brian; Bandeira, Nuno; Nesvizhskii, Alexey I; Choi, Hyungwon; Gingras, Anne-Claude

    2016-10-21

    Affinity purification coupled with mass spectrometry (AP-MS) is a powerful technique for the identification and quantification of physical interactions. AP-MS requires careful experimental design, appropriate control selection and quantitative workflows to successfully identify bona fide interactors amongst a large background of contaminants. We previously introduced ProHits, a Laboratory Information Management System for interaction proteomics, which tracks all samples in a mass spectrometry facility, initiates database searches and provides visualization tools for spectral counting-based AP-MS approaches. More recently, we implemented Significance Analysis of INTeractome (SAINT) within ProHits to provide scoring of interactions based on spectral counts. Here, we provide an update to ProHits to support Data Independent Acquisition (DIA) with identification software (DIA-Umpire and MSPLIT-DIA), quantification tools (through DIA-Umpire, or externally via targeted extraction), and assessment of quantitative enrichment (through mapDIA) and scoring of interactions (through SAINT-intensity). With additional improvements, notably support of the iProphet pipeline, facilitated deposition into ProteomeXchange repositories and enhanced export and viewing functions, ProHits 4.0 offers a comprehensive suite of tools to facilitate affinity proteomics studies. It remains challenging to score, annotate and analyze proteomics data in a transparent manner. ProHits was previously introduced as a LIMS to enable storing, tracking and analysis of standard AP-MS data. In this revised version, we expand ProHits to include integration with a number of identification and quantification tools based on Data-Independent Acquisition (DIA). ProHits 4.0 also facilitates data deposition into public repositories, and the transfer of data to new visualization tools. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.

    PubMed

    Muth, Thilo; Kohrs, Fabian; Heyer, Robert; Benndorf, Dirk; Rapp, Erdmann; Reichl, Udo; Martens, Lennart; Renard, Bernhard Y

    2018-01-02

    Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .

  16. Proteomics and Metabolomics: Two Emerging Areas for Legume Improvement

    PubMed Central

    Ramalingam, Abirami; Kudapa, Himabindu; Pazhamala, Lekha T.; Weckwerth, Wolfram; Varshney, Rajeev K.

    2015-01-01

    The crop legumes such as chickpea, common bean, cowpea, peanut, pigeonpea, soybean, etc. are important sources of nutrition and contribute to a significant amount of biological nitrogen fixation (>20 million tons of fixed nitrogen) in agriculture. However, the production of legumes is constrained due to abiotic and biotic stresses. It is therefore imperative to understand the molecular mechanisms of plant response to different stresses and identify key candidate genes regulating tolerance which can be deployed in breeding programs. The information obtained from transcriptomics has facilitated the identification of candidate genes for the given trait of interest and utilizing them in crop breeding programs to improve stress tolerance. However, the mechanisms of stress tolerance are complex due to the influence of multi-genes and post-transcriptional regulations. Furthermore, stress conditions greatly affect gene expression which in turn causes modifications in the composition of plant proteomes and metabolomes. Therefore, functional genomics involving various proteomics and metabolomics approaches have been obligatory for understanding plant stress tolerance. These approaches have also been found useful to unravel different pathways related to plant and seed development as well as symbiosis. Proteome and metabolome profiling using high-throughput based systems have been extensively applied in the model legume species, Medicago truncatula and Lotus japonicus, as well as in the model crop legume, soybean, to examine stress signaling pathways, cellular and developmental processes and nodule symbiosis. Moreover, the availability of protein reference maps as well as proteomics and metabolomics databases greatly support research and understanding of various biological processes in legumes. Protein-protein interaction techniques, particularly the yeast two-hybrid system have been advantageous for studying symbiosis and stress signaling in legumes. In this review, several studies on proteomics and metabolomics in model and crop legumes have been discussed. Additionally, applications of advanced proteomics and metabolomics approaches have also been included in this review for future applications in legume research. The integration of these “omics” approaches will greatly support the identification of accurate biomarkers in legume smart breeding programs. PMID:26734026

  17. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines.

    PubMed

    Jones, Andrew R; Siepen, Jennifer A; Hubbard, Simon J; Paton, Norman W

    2009-03-01

    LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.

  18. OralCard: a bioinformatic tool for the study of oral proteome.

    PubMed

    Arrais, Joel P; Rosa, Nuno; Melo, José; Coelho, Edgar D; Amaral, Diana; Correia, Maria José; Barros, Marlene; Oliveira, José Luís

    2013-07-01

    The molecular complexity of the human oral cavity can only be clarified through identification of components that participate within it. However current proteomic techniques produce high volumes of information that are dispersed over several online databases. Collecting all of this data and using an integrative approach capable of identifying unknown associations is still an unsolved problem. This is the main motivation for this work. We present the online bioinformatic tool OralCard, which comprises results from 55 manually curated articles reflecting the oral molecular ecosystem (OralPhysiOme). It comprises experimental information available from the oral proteome both of human (OralOme) and microbial origin (MicroOralOme) structured in protein, disease and organism. This tool is a key resource for researchers to understand the molecular foundations implicated in biology and disease mechanisms of the oral cavity. The usefulness of this tool is illustrated with the analysis of the oral proteome associated with diabetes melitus type 2. OralCard is available at http://bioinformatics.ua.pt/oralcard. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Metaproteomics as a Complementary Approach to Gut Microbiota in Health and Disease

    NASA Astrophysics Data System (ADS)

    Petriz, Bernardo A.; Franco, Octávio L.

    2017-01-01

    Classic studies on phylotype profiling are limited to the identification of microbial constituents, where information is lacking about the molecular interaction of these bacterial communities with the host genome and the possible outcomes in host biology. A range of OMICs approaches have provided great progress linking the microbiota to health and disease. However, the investigation of this context through proteomic mass spectrometry-based tools is still being improved. Therefore, metaproteomics or community proteogenomics has emerged as a complementary approach to metagenomic data, as a field in proteomics aiming to perform large-scale characterization of proteins from environmental microbiota such as the human gut. The advances in molecular separation methods coupled with mass spectrometry (e.g. LC-MS/MS) and proteome bioinformatics have been fundamental in these novel large-scale metaproteomic studies, which have further been performed in a wide range of samples including soil, plant and human environments. Metaproteomic studies will make major progress if a comprehensive database covering the genes and expresses proteins from all gut microbial species is developed. To this end, we here present some of the main limitations of metaproteomic studies in complex microbiota environments such as the gut, also addressing the up-to-date pipelines in sample preparation prior to fractionation/separation and mass spectrometry analysis. In addition, a novel approach to the limitations of metagenomic databases is also discussed. Finally, prospects are addressed regarding the application of metaproteomic analysis using a unified host-microbiome gene database and other meta-OMICs platforms.

  20. Proteogenomic Analysis Greatly Expands the Identification of Proteins Related to Reproduction in the Apogamous Fern Dryopteris affinis ssp. affinis.

    PubMed

    Grossmann, Jonas; Fernández, Helena; Chaubey, Pururawa M; Valdés, Ana E; Gagliardini, Valeria; Cañal, María J; Russo, Giancarlo; Grossniklaus, Ueli

    2017-01-01

    Performing proteomic studies on non-model organisms with little or no genomic information is still difficult. However, many specific processes and biochemical pathways occur only in species that are poorly characterized at the genomic level. For example, many plants can reproduce both sexually and asexually, the first one allowing the generation of new genotypes and the latter their fixation. Thus, both modes of reproduction are of great agronomic value. However, the molecular basis of asexual reproduction is not well understood in any plant. In ferns, it combines the production of unreduced spores (diplospory) and the formation of sporophytes from somatic cells (apogamy). To set the basis to study these processes, we performed transcriptomics by next-generation sequencing (NGS) and shotgun proteomics by tandem mass spectrometry in the apogamous fern D. affinis ssp. affinis . For protein identification we used the public viridiplantae database (VPDB) to identify orthologous proteins from other plant species and new transcriptomics data to generate a "species-specific transcriptome database" (SSTDB). In total 1,397 protein clusters with 5,865 unique peptide sequences were identified (13 decoy proteins out of 1,410, protFDR 0.93% on protein cluster level). We show that using the SSTDB for protein identification increases the number of identified peptides almost four times compared to using only the publically available VPDB. We identified homologs of proteins involved in reproduction of higher plants, including proteins with a potential role in apogamy. With the increasing availability of genomic data from non-model species, similar proteogenomics approaches will improve the sensitivity in protein identification for species only distantly related to models.

  1. RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2007-01-01

    Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253

  2. Mass spectrometry-based identification of allergens from Curvularia pallescens, a prevalent aerospore in India.

    PubMed

    Dey, Debarati; Saha, Bodhisattwa; Sircar, Gaurab; Ghosal, Kavita; Bhattacharya, Swati Gupta

    2016-07-01

    The worldwide prevalence of fungal allergy in recent years has augmented mining allergens from yet unexplored ones. Curvularia pallescens (CP) being a dominant aerospore in India and a major sensitiser on a wide range of allergic population, pose a serious threat to human health. Therefore, we aimed to identify novel allergens from CP in our present study. A cohort of 22 CP-sensitised patients was selected by positive Skin prick grade. Individual sera exhibited elevated specific IgE level and significant histamine release on a challenge with antigenic extract of CP. First gel-based profiling of CP proteome was done by 1- and 2-dimensional gel. Parallel 1- and 2-dimensional immunoblot were performed applying individual as well as pooled patient sera. Identification of the sero-reactive spots from the 2-dimensional gel was found to be challenging as CP was not previously sequenced. Hence, mass spectrometry-based proteomic workflow consisting of conventional database search was not alone sufficient. Therefore, de novo sequencing preceded homology search was implemented for further identification. Altogether 11 allergenic proteins including Brn-1, vacuolar protease, and fructose-bis-phosphate aldolase were identified with high statistical confidence (p<0.05). This is the first study to report on any allergens from CP. This kind of proteome-based analysis provided a catalogue of CP allergens that would lead an improved way of diagnosis and therapy of CP-related allergy. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    PubMed

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  4. A novel algorithm for validating peptide identification from a shotgun proteomics search engine.

    PubMed

    Jian, Ling; Niu, Xinnan; Xia, Zhonghang; Samir, Parimal; Sumanasekera, Chiranthani; Mu, Zheng; Jennings, Jennifer L; Hoek, Kristen L; Allos, Tara; Howard, Leigh M; Edwards, Kathryn M; Weil, P Anthony; Link, Andrew J

    2013-03-01

    Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.

  5. HTAPP: High-Throughput Autonomous Proteomic Pipeline

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2011-01-01

    Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676

  6. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information.

    PubMed

    Yu, Kebing; Salomon, Arthur R

    2009-12-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.

  7. FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines.

    PubMed

    Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R; Hubbard, Simon J

    2011-04-01

    Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.

  8. FDRAnalysis: A tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines

    PubMed Central

    Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R.; Hubbard, Simon J.

    2013-01-01

    Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the post-processing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server, and a downloadable application, which makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download. PMID:21222473

  9. DeMix Workflow for Efficient Identification of Cofragmented Peptides in High Resolution Data-dependent Tandem Mass Spectrometry*

    PubMed Central

    Zhang, Bo; Pirmoradian, Mohammad; Chernobrovkin, Alexey; Zubarev, Roman A.

    2014-01-01

    Based on conventional data-dependent acquisition strategy of shotgun proteomics, we present a new workflow DeMix, which significantly increases the efficiency of peptide identification for in-depth shotgun analysis of complex proteomes. Capitalizing on the high resolution and mass accuracy of Orbitrap-based tandem mass spectrometry, we developed a simple deconvolution method of “cloning” chimeric tandem spectra for cofragmented peptides. Additional to a database search, a simple rescoring scheme utilizes mass accuracy and converts the unwanted cofragmenting events into a surprising advantage of multiplexing. With the combination of cloning and rescoring, we obtained on average nine peptide-spectrum matches per second on a Q-Exactive workbench, whereas the actual MS/MS acquisition rate was close to seven spectra per second. This efficiency boost to 1.24 identified peptides per MS/MS spectrum enabled analysis of over 5000 human proteins in single-dimensional LC-MS/MS shotgun experiments with an only two-hour gradient. These findings suggest a change in the dominant “one MS/MS spectrum - one peptide” paradigm for data acquisition and analysis in shotgun data-dependent proteomics. DeMix also demonstrated higher robustness than conventional approaches in terms of lower variation among the results of consecutive LC-MS/MS runs. PMID:25100859

  10. In-depth analysis of the thylakoid membrane proteome of Arabidopsis thaliana chloroplasts: new proteins, new functions, and a plastid proteome database.

    PubMed

    Friso, Giulia; Giacomelli, Lisa; Ytterberg, A Jimmy; Peltier, Jean-Benoit; Rudella, Andrea; Sun, Qi; Wijk, Klaas J van

    2004-02-01

    An extensive analysis of the Arabidopsis thaliana peripheral and integral thylakoid membrane proteome was performed by sequential extractions with salt, detergent, and organic solvents, followed by multidimensional protein separation steps (reverse-phase HPLC and one- and two-dimensional electrophoresis gels), different enzymatic and nonenzymatic protein cleavage techniques, mass spectrometry, and bioinformatics. Altogether, 154 proteins were identified, of which 76 (49%) were alpha-helical integral membrane proteins. Twenty-seven new proteins without known function but with predicted chloroplast transit peptides were identified, of which 17 (63%) are integral membrane proteins. These new proteins, likely important in thylakoid biogenesis, include two rubredoxins, a potential metallochaperone, and a new DnaJ-like protein. The data were integrated with our analysis of the lumenal-enriched proteome. We identified 83 out of 100 known proteins of the thylakoid localized photosynthetic apparatus, including several new paralogues and some 20 proteins involved in protein insertion, assembly, folding, or proteolysis. An additional 16 proteins are involved in translation, demonstrating that the thylakoid membrane surface is an important site for protein synthesis. The high coverage of the photosynthetic apparatus and the identification of known hydrophobic proteins with low expression levels, such as cpSecE, Ohp1, and Ohp2, indicate an excellent dynamic resolution of the analysis. The sequential extraction process proved very helpful to validate transmembrane prediction. Our data also were cross-correlated to chloroplast subproteome analyses by other laboratories. All data are deposited in a new curated plastid proteome database (PPDB) with multiple search functions (http://cbsusrv01.tc.cornell.edu/users/ppdb/). This PPDB will serve as an expandable resource for the plant community.

  11. Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks*

    PubMed Central

    Bandeira, Nuno

    2016-01-01

    Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software. PMID:27609420

  12. Mining biological databases for candidate disease genes

    NASA Astrophysics Data System (ADS)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  13. Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data

    DOE PAGES

    Woo, Sunghee; Cha, Seong Won; Na, Seungjin; ...

    2014-11-17

    Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub-typing of cancers, and the discovery of novel biomarkers. The availability of genomics technologies (mainly wholegenome and exome sequencing, and transcript sampling via RNA-seq, collectively referred to as NGS) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome using only genomic approaches. Recently, combination of proteomic and genomic technologies are increasingly employed. However, the complexity and redundancymore » of NGS data remains a challenge for proteogenomics, and various trade-offs must be made to allow for the searches to take place. This paperprovides a discussion of two such trade-offs, relating to large database search, and FDR calculations, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database which contained 2,787,062 novel splice junctions, 38,464 deletions, 1105 insertions, and 182,302 substitutions. Proteomic data from a single ovarian carcinoma sample (439,858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65,578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and non-sample-recruited mutations, which emphasize the strength of our approach.« less

  14. Extensive characterization of Tupaia belangeri neuropeptidome using an integrated mass spectrometric approach.

    PubMed

    Petruzziello, Filomena; Fouillen, Laetitia; Wadensten, Henrik; Kretz, Robert; Andren, Per E; Rainer, Gregor; Zhang, Xiaozhe

    2012-02-03

    Neuropeptidomics is used to characterize endogenous peptides in the brain of tree shrews (Tupaia belangeri). Tree shrews are small animals similar to rodents in size but close relatives of primates, and are excellent models for brain research. Currently, tree shrews have no complete proteome information available on which direct database search can be allowed for neuropeptide identification. To increase the capability in the identification of neuropeptides in tree shrews, we developed an integrated mass spectrometry (MS)-based approach that combines methods including data-dependent, directed, and targeted liquid chromatography (LC)-Fourier transform (FT)-tandem MS (MS/MS) analysis, database construction, de novo sequencing, precursor protein search, and homology analysis. Using this integrated approach, we identified 107 endogenous peptides that have sequences identical or similar to those from other mammalian species. High accuracy MS and tandem MS information, with BLAST analysis and chromatographic characteristics were used to confirm the sequences of all the identified peptides. Interestingly, further sequence homology analysis demonstrated that tree shrew peptides have a significantly higher degree of homology to equivalent sequences in humans than those in mice or rats, consistent with the close phylogenetic relationship between tree shrews and primates. Our results provide the first extensive characterization of the peptidome in tree shrews, which now permits characterization of their function in nervous and endocrine system. As the approach developed fully used the conservative properties of neuropeptides in evolution and the advantage of high accuracy MS, it can be portable for identification of neuropeptides in other species for which the fully sequenced genomes or proteomes are not available.

  15. Rice proteome database: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko

    2005-09-01

    The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a beta-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.

  16. ApoptoProteomics, an integrated database for analysis of proteomics data obtained from apoptotic cells.

    PubMed

    Arntzen, Magnus Ø; Thiede, Bernd

    2012-02-01

    Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no.

  17. ApoptoProteomics, an Integrated Database for Analysis of Proteomics Data Obtained from Apoptotic Cells*

    PubMed Central

    Arntzen, Magnus Ø.; Thiede, Bernd

    2012-01-01

    Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no. PMID:22067098

  18. hEIDI: An Intuitive Application Tool To Organize and Treat Large-Scale Proteomics Data.

    PubMed

    Hesse, Anne-Marie; Dupierris, Véronique; Adam, Claire; Court, Magali; Barthe, Damien; Emadali, Anouk; Masselon, Christophe; Ferro, Myriam; Bruley, Christophe

    2016-10-07

    Advances in high-throughput proteomics have led to a rapid increase in the number, size, and complexity of the associated data sets. Managing and extracting reliable information from such large series of data sets require the use of dedicated software organized in a consistent pipeline to reduce, validate, exploit, and ultimately export data. The compilation of multiple mass-spectrometry-based identification and quantification results obtained in the context of a large-scale project represents a real challenge for developers of bioinformatics solutions. In response to this challenge, we developed a dedicated software suite called hEIDI to manage and combine both identifications and semiquantitative data related to multiple LC-MS/MS analyses. This paper describes how, through a user-friendly interface, hEIDI can be used to compile analyses and retrieve lists of nonredundant protein groups. Moreover, hEIDI allows direct comparison of series of analyses, on the basis of protein groups, while ensuring consistent protein inference and also computing spectral counts. hEIDI ensures that validated results are compliant with MIAPE guidelines as all information related to samples and results is stored in appropriate databases. Thanks to the database structure, validated results generated within hEIDI can be easily exported in the PRIDE XML format for subsequent publication. hEIDI can be downloaded from http://biodev.extra.cea.fr/docs/heidi .

  19. Protannotator: a semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome.

    PubMed

    Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba

    2014-01-03

    The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

  20. TrSDB: a proteome database of transcription factors

    PubMed Central

    Hermoso, Antoni; Aguilar, Daniel; Aviles, Francesc X.; Querol, Enrique

    2004-01-01

    TrSDB—TranScout Database—(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression. PMID:14681387

  1. A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics

    PubMed Central

    Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.

    2011-01-01

    Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, as this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referred to as Statistical Tools for AMT tag Confidence (STAC). STAC additionally provides a Uniqueness Probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download as both a command line and a Windows graphical application. PMID:21692516

  2. Clinical proteomics in kidney disease as an exponential technology: heading towards the disruptive phase.

    PubMed

    Sanchez-Niño, Maria Dolores; Sanz, Ana B; Ramos, Adrian M; Fernandez-Fernandez, Beatriz; Ortiz, Alberto

    2017-04-01

    Exponential technologies double in power or processing speed every year, whereas their cost halves. Deception and disruption are two key stages in the development of exponential technologies. Deception occurs when, after initial introduction, technologies are dismissed as irrelevant, while they continue to progress, perhaps not as fast or with so many immediate practical applications as initially thought. Twenty years after the first publications, clinical proteomics is still not available in most hospitals and some clinicians have felt deception at unfulfilled promises. However, there are indications that clinical proteomics may be entering the disruptive phase, where, once refined, technologies disrupt established industries or procedures. In this regard, recent manuscripts in CKJ illustrate how proteomics is entering the clinical realm, with applications ranging from the identification of amyloid proteins in the pathology lab, to a new generation of urinary biomarkers for chronic kidney disease (CKD) assessment and outcome prediction. Indeed, one such panel of urinary peptidomics biomarkers, CKD273, recently received a Food and Drug Administration letter of support, the first ever in the CKD field. In addition, a must-read resource providing information on kidney disease-related proteomics and systems biology databases and how to access and use them in clinical decision-making was also recently published in CKJ .

  3. Unraveling endometriosis-associated ovarian carcinomas using integrative proteomics

    PubMed Central

    Leung, Felix; Bernardini, Marcus Q.; Liang, Kun; Batruch, Ihor; Rouzbahman, Marjan; Diamandis, Eleftherios P.; Kulasingam, Vathany

    2018-01-01

    Background: To elucidate potential markers of endometriosis and endometriosis-associated endometrioid and clear cell ovarian carcinomas using mass spectrometry-based proteomics. Methods: A total of 21 fresh, frozen tissues from patients diagnosed with clear cell carcinoma, endometrioid carcinoma, endometriosis and benign endometrium were subjected to an in-depth liquid chromatography-tandem mass spectrometry analysis on the Q-Exactive Plus. Protein identification and quantification were performed using MaxQuant, while downstream analyses were performed using Perseus and various bioinformatics databases. Results: Approximately 9000 proteins were identified in total, representing the first in-depth proteomic investigation of endometriosis and its associated cancers. This proteomic data was shown to be biologically sound, with minimal variation within patient cohorts and recapitulation of known markers. While moderate concordance with genomic data was observed, it was shown that such data are limited in their abilities to represent tumours on the protein level and to distinguish tumours from their benign precursors. Conclusions: The proteomic data suggests that distinct markers may differentiate endometrioid and clear cell carcinoma from endometriosis. These markers may be indicators of pathobiology but will need to be further investigated. Ultimately, this dataset may serve as a basis to unravel the underlying biology of the endometrioid and clear cell cancers with respect to their endometriotic origins. PMID:29721309

  4. PeptideDepot: Flexible Relational Database for Visual Analysis of Quantitative Proteomic Data and Integration of Existing Protein Information

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2010-01-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895

  5. The secrets of Oriental panacea: Panax ginseng.

    PubMed

    Colzani, Mara; Altomare, Alessandra; Caliendo, Matteo; Aldini, Giancarlo; Righetti, Pier Giorgio; Fasoli, Elisa

    2016-01-01

    The Panax ginseng root proteome has been investigated via capture with combinatorial peptide ligand libraries (CPLL) at three different pH values. Proteomic characterization by SDS-PAGE and nLC–MS/MS analysis, via LTQ-Orbitrap XL, led to the identification of a total of 207 expressed proteins. This quite large number of identifications was achieved by consulting two different plant databases: P. ginseng and Arabidopsis thaliana. The major groups of identified proteins were associated to structural species (19.2%), oxidoreductase (19.5%), dehydrogenases (7.6%) and synthases (9.0%). For the first time, an exploration of protein–protein interactions was performed by merging all recognized proteins and building an interactomic map, characterized by 196 nodes and 1554 interactions. Finally a peptidomic analysis was developed combining different in-silico enzymatic digestions to simulate the human gastrointestinal process: from 661 generated peptides, 95 were identified as possible bioactives and in particular 6 of them were characterized by antimicrobial activity. The present report offers new insight for future investigations focused on elucidation of biological properties of P. ginseng proteome and peptidome. Ginseng is a traditional oriental herbal remedy whose use is very diffused in all the world for its numerous pharmacological effects. However, the exact mechanism of action of ginseng components, both ginsenosides and proteins, is still unidentified. So the common use of ginseng requires strict investigations to assess both its efficiency and its safety. Although many reports have been published regarding the pharmacological effects of ginseng, little is known about the biochemical pathways of root. Proteomics analysis could be useful to elucidate the physiological pathways. In this manuscript, an integrated approach to proteomics and peptidomics will usher in exploration of Panax ginseng proteins and proteolytic peptides, obtained by in-silico gastrointestinal digestion, characterized by antimicrobial action. The present research would pave the way for better knowledge of metabolic functions connected with ginseng proteome and provide with new information necessary to understand better antimicrobial activity of P. ginseng.

  6. False discovery rates in spectral identification.

    PubMed

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno

    2012-01-01

    Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold x% with the expectation that at most x% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives. Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS/MS spectra from samples where we were able to define a factual FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10× higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy. While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.

  7. Consolidation of proteomics data in the Cancer Proteomics database.

    PubMed

    Arntzen, Magnus Ø; Boddie, Paul; Frick, Rahel; Koehler, Christian J; Thiede, Bernd

    2015-11-01

    Cancer is a class of diseases characterized by abnormal cell growth and one of the major reasons for human deaths. Proteins are involved in the molecular mechanisms leading to cancer, furthermore they are affected by anti-cancer drugs, and protein biomarkers can be used to diagnose certain cancer types. Therefore, it is important to explore the proteomics background of cancer. In this report, we developed the Cancer Proteomics database to re-interrogate published proteome studies investigating cancer. The database is divided in three sections related to cancer processes, cancer types, and anti-cancer drugs. Currently, the Cancer Proteomics database contains 9778 entries of 4118 proteins extracted from 143 scientific articles covering all three sections: cell death (cancer process), prostate cancer (cancer type) and platinum-based anti-cancer drugs including carboplatin, cisplatin, and oxaliplatin (anti-cancer drugs). The detailed information extracted from the literature includes basic information about the articles (e.g., PubMed ID, authors, journal name, publication year), information about the samples (type, study/reference, prognosis factor), and the proteomics workflow (Subcellular fractionation, protein, and peptide separation, mass spectrometry, quantification). Useful annotations such as hyperlinks to UniProt and PubMed were included. In addition, many filtering options were established as well as export functions. The database is freely available at http://cancerproteomics.uio.no. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  9. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  10. High-resolution proteomic profiling of spider venom: expanding the toxin diversity of Phoneutria nigriventer venom.

    PubMed

    Liberato, Tarcísio; Troncone, Lanfranco Ranieri Paolo; Yamashiro, Edson T; Serrano, Solange M T; Zelanis, André

    2016-03-01

    Here we present a proteomic characterization of Phoneutria nigriventer venom. A shotgun proteomic approach allowed the identification, for the first time, of O-glycosyl hydrolases (chitinases) in P. nigriventer venom. The electrophoretic profiles under nonreducing and reducing conditions, and protein identification by mass spectrometry, indicated the presence of oligomeric toxin structures in the venom. Complementary proteomic approaches allowed for a qualitative and semi-quantitative profiling of P. nigriventer venom complexity, expanding its known venom proteome diversity.

  11. MALDI-TOF mass spectrometry as a potential tool for Trichomonas vaginalis identification.

    PubMed

    Calderaro, Adriana; Piergianni, Maddalena; Montecchini, Sara; Buttrini, Mirko; Piccolo, Giovanna; Rossi, Sabina; Arcangeletti, Maria Cristina; Medici, Maria Cristina; Chezzi, Carlo; De Conto, Flora

    2016-06-10

    Trichomonas vaginalis is a flagellated protozoan causing trichomoniasis, a sexually transmitted human infection, with around 276.4 million new cases estimated by World Health Organization. Culture is the gold standard method for the diagnosis of T. vaginalis infection. Recently, immunochromatographic assays as well as PCR assays for the detection of T. vaginalis antigen or DNA, respectively, have been also available. Although the well-known genome sequence of T. vaginalis has made possible the application of proteomic studies, few data are available about the overall proteomic expression profiling of T. vaginalis. The aim of this study was to investigate the potential application of MALDI-TOF MS as a new tool for the identification of T. vaginalis. Twenty-one isolates were analysed by MALDI-TOF MS after the creation of a Main Spectrum Profile (MSP) from a T. vaginalis reference strain (G3) and its subsequent supplementation in the Bruker Daltonics database, not including any profile of protozoa. This was achieved after the development of a new identification method created by modifying the range setting (6-10 kDa) for the MALDI-TOF MS analysis in order to exclude the overlapping of peaks derived from the culture media used in this study. Two MSP reference spectra were created in 2 different range: 3-15 kDa (standard range setting) and 6-10 kDa (new range setting). Both MSP spectra were deposited in the MALDI BioTyper database for further identification of additional T. vaginalis strains. All the 21 strains analysed in this study were correctly identified by using the new identification method. In this study it was demonstrated that changes in the MALDI-TOF MS standard parameters usually used to identify bacteria and fungi allowed the identification of the protozoan T. vaginalis. This study shows the usefulness of MALDI-TOF MS in the reliable identification of microorganism grown on complex liquid media such as the protozoan T. vaginalis, on the basis of the proteic profile and not on the basis of single markers, by using a "new range setting" different from that developed for bacteria and fungi.

  12. Proteome Analysis of the Plasma Membrane of Mycobacterium Tuberculosis

    PubMed Central

    Arora, Shalini; Kosalai, K.; Namane, Abdelkader; Pym, Alex S.; Cole, Stewart T.

    2002-01-01

    The plasma membrane of Mycobacterium tuberculosis is likely to contain proteins that could serve as novel drug targets, diagnostic probes or even components of a vaccine against tuberculosis. With this in mind, we have undertaken proteome analysis of the membrane of M. tuberculosis H37Rv. Isolated membrane vesicles were extracted with either a detergent (Triton X114) or an alkaline buffer (carbonate) following two of the protocols recommended for membrane protein enrichment. Proteins were resolved by 2D-GE using immobilized pH gradient (IPG) strips, and identified by peptide mass mapping utilizing the M. tuberculosis genome database. The two extraction procedures yielded patterns with minimal overlap. Only two proteins, both HSPs, showed a common presence. MALDI–MS analysis of 61 spots led to the identification of 32 proteins, 17 of which were new to the M. tuberculosis proteome database. We classified 19 of the identified proteins as ‘membrane-associated’; 14 of these were further classified as ‘membrane-bound’, three of which were lipoproteins. The remaining proteins included four heat-shock proteins and several enzymes involved in energy or lipid metabolism. Extraction with Triton X114 was found to be more effective than carbonate for detecting ‘putative’ M. tuberculosis membrane proteins. The protocol was also found to be suitable for comparing BCG and M. tuberculosis membranes, identifying ESAT-6 as being expressed selectively in M. tuberculosis. While this study demonstrates for the first time some of the membrane proteins of M. tuberculosis, it also underscores the problems associated with proteomic analysis of a complex membrane such as that of a mycobacterium. PMID:18629250

  13. Quantitative proteome analysis of barley seeds using ruthenium(II)-tris-(bathophenanthroline-disulphonate) staining.

    PubMed

    Witzel, Katja; Surabhi, Giridara-Kumar; Jyothsnakumari, Gottimukkala; Sudhakar, Chinta; Matros, Andrea; Mock, Hans-Peter

    2007-04-01

    This paper describes the application of the recently introduced fluorescence stain Ruthenium(II)-tris-(bathophenanthroline-disulphonate) (RuBP) on a comparative proteome analysis of two phenotypically different barley lines. We carried out an analysis of protein patterns from 2-D gels of the parental lines of the Oregon Wolfe Barley mapping population DOM and REC and stained with either the conventional colloidal Coomassie Brilliant Blue (cCBB) or with the novel RuBP solution. We wished to experimentally verify the usefulness of such a stain in evaluating the complex pattern of a seed proteome, in comparison to the previously used cCBB staining technique. To validate the efficiency of visualization by both stains, we first compared the overall number of detected protein spots. On average, 790 spots were visible by cCBB staining and 1200 spots by RuBP staining. Then, the intensity of a set of spots was assessed, and changes in relative abundance were determined using image analysis software. As expected, staining with RuBP performed better in quantitation in terms of sensitivity and dynamic range. Furthermore, spots from a cultivar-specific region in the protein map were chosen for identification to asses the gain of biological information due to the staining procedure. From this particular region, eight spots were visualized exclusively by RuBP and identification was successful for all spots, proving the ability to identify even very low abundant proteins. Performance in MS analysis was comparable for both protein stains. Proteins were identified by MALDI-TOF MS peptide mass fingerprinting. This approach was not successful for all spots, due to the restricted entry number for barley in the database. Therefore, we subsequently used LC-ESI-Q-TOF MS/MS and de novo sequencing for identification. Because only an insufficient number of proteins from barley is annotated, an EST-based identification strategy was chosen for our experiment. We wished to test whether under these limitations the application of a more sensitive stain would lead to a more advanced proteome approach. In summary, we demonstrate here that the application of RuBP as an economical but reliable and sensitive fluorescence stain is highly suitable for quantitative proteome analysis of plant seeds.

  14. COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA

    PubMed Central

    Wenger, Craig D.; Phanstiel, Douglas H.; Lee, M. Violet; Bailey, Derek J.; Coon, Joshua J.

    2011-01-01

    Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline. PMID:21298793

  15. Comparative proteome analysis of Milnesium tardigradum in early embryonic state versus adults in active and anhydrobiotic state.

    PubMed

    Schokraie, Elham; Warnken, Uwe; Hotz-Wagenblatt, Agnes; Grohme, Markus A; Hengherr, Steffen; Förster, Frank; Schill, Ralph O; Frohme, Marcus; Dandekar, Thomas; Schnölzer, Martina

    2012-01-01

    Tardigrades have fascinated researchers for more than 300 years because of their extraordinary capability to undergo cryptobiosis and survive extreme environmental conditions. However, the survival mechanisms of tardigrades are still poorly understood mainly due to the absence of detailed knowledge about the proteome and genome of these organisms. Our study was intended to provide a basis for the functional characterization of expressed proteins in different states of tardigrades. High-throughput, high-accuracy proteomics in combination with a newly developed tardigrade specific protein database resulted in the identification of more than 3000 proteins in three different states: early embryonic state and adult animals in active and anhydrobiotic state. This comprehensive proteome resource includes protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. A comparative analysis of protein families in the different states was performed by calculating the exponentially modified protein abundance index which classifies proteins in major and minor components. This is the first step to analyzing the proteins involved in early embryonic development, and furthermore proteins which might play an important role in the transition into the anhydrobiotic state.

  16. Comparative proteome analysis of Milnesium tardigradum in early embryonic state versus adults in active and anhydrobiotic state

    PubMed Central

    Schokraie, Elham; Warnken, Uwe; Hotz-Wagenblatt, Agnes; Grohme, Markus A.; Hengherr, Steffen; Förster, Frank; Schill, Ralph O.; Frohme, Marcus; Dandekar, Thomas; Schnölzer, Martina

    2012-01-01

    Tardigrades have fascinated researchers for more than 300 years because of their extraordinary capability to undergo cryptobiosis and survive extreme environmental conditions. However, the survival mechanisms of tardigrades are still poorly understood mainly due to the absence of detailed knowledge about the proteome and genome of these organisms. Our study was intended to provide a basis for the functional characterization of expressed proteins in different states of tardigrades. High-throughput, high-accuracy proteomics in combination with a newly developed tardigrade specific protein database resulted in the identification of more than 3000 proteins in three different states: early embryonic state and adult animals in active and anhydrobiotic state. This comprehensive proteome resource includes protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. A comparative analysis of protein families in the different states was performed by calculating the exponentially modified protein abundance index which classifies proteins in major and minor components. This is the first step to analyzing the proteins involved in early embryonic development, and furthermore proteins which might play an important role in the transition into the anhydrobiotic state. PMID:23029181

  17. Informed-Proteomics: open-source software package for top-down proteomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Park, Jungkap; Piehowski, Paul D.; Wilkins, Christopher

    Top-down proteomics involves the analysis of intact proteins. This approach is very attractive as it allows for analyzing proteins in their endogenous form without proteolysis, preserving valuable information about post-translation modifications, isoforms, proteolytic processing or their combinations collectively called proteoforms. Moreover, the quality of the top-down LC-MS/MS datasets is rapidly increasing due to advances in the liquid chromatography and mass spectrometry instrumentation and sample processing protocols. However, the top-down mass spectra are substantially more complex compare to the more conventional bottom-up data. To take full advantage of the increasing quality of the top-down LC-MS/MS datasets there is an urgent needmore » to develop algorithms and software tools for confident proteoform identification and quantification. In this study we present a new open source software suite for top-down proteomics analysis consisting of an LC-MS feature finding algorithm, a database search algorithm, and an interactive results viewer. The presented tool along with several other popular tools were evaluated using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.« less

  18. Shotgun proteomic analysis of Emiliania huxleyi, a marine phytoplankton species of major biogeochemical importance.

    PubMed

    Jones, Bethan M; Edwards, Richard J; Skipp, Paul J; O'Connor, C David; Iglesias-Rodriguez, M Debora

    2011-06-01

    Emiliania huxleyi is a unicellular marine phytoplankton species known to play a significant role in global biogeochemistry. Through the dual roles of photosynthesis and production of calcium carbonate (calcification), carbon is transferred from the atmosphere to ocean sediments. Almost nothing is known about the molecular mechanisms that control calcification, a process that is tightly regulated within the cell. To initiate proteomic studies on this important and phylogenetically remote organism, we have devised efficient protein extraction protocols and developed a bioinformatics pipeline that allows the statistically robust assignment of proteins from MS/MS data using preexisting EST sequences. The bioinformatics tool, termed BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics using ESTs), is fully automated and was used to search against data generated from three strains. BUDAPEST increased the number of identifications over standard protein database searches from 37 to 99 proteins when data were amalgamated. Proteins involved in diverse cellular processes were uncovered. For example, experimental evidence was obtained for a novel type I polyketide synthase and for various photosystem components. The proteomic and bioinformatic approaches developed in this study are of wider applicability, particularly to the oceanographic community where genomic sequence data for species of interest are currently scarce.

  19. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    PubMed Central

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-01-01

    Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method. PMID:19193216

  20. Quantitative proteomic analysis of bacterial enzymes released in cheese during ripening.

    PubMed

    Jardin, Julien; Mollé, Daniel; Piot, Michel; Lortal, Sylvie; Gagnaire, Valérie

    2012-04-02

    Due to increasingly available bacterial genomes in databases, proteomic tools have recently been used to screen proteins expressed by micro-organisms in food in order to better understand their metabolism in situ. While the main objective is the systematic identification of proteins, the next step will be to bridge the gap between identification and quantification of these proteins. For that purpose, a new mass spectrometry-based approach was applied, using isobaric tagging reagent for quantitative proteomic analysis (iTRAQ), which are amine specific and yield labelled peptides identical in mass. Experimental Swiss-type cheeses were manufactured from microfiltered milk using Streptococcus thermophilus ITG ST20 and Lactobacillus helveticus ITG LH1 as lactic acid starters. At three ripening times (7, 20 and 69 days), cheese aqueous phases were extracted and enriched in bacterial proteins by fractionation. Each sample, standardised in protein amount prior to proteomic analyses, was: i) analysed by 2D-electrophoresis for qualitative analysis and ii) submitted to trypsinolysis, and labelled with specific iTRAQ tag, one per ripening time. The three labelled samples were mixed together and analysed by nano-LC coupled on-line with ESI-QTOF mass spectrometer. Thirty proteins, both from bacterial or bovine origin, were identified and efficiently quantified. The free bacterial proteins detected were enzymes from the central carbon metabolism as well as stress proteins. Depending on the protein considered, the quantity of these proteins in the cheese aqueous extract increased from 2.5 to 20 fold in concentration from day 7 to day 69 of ripening. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

    PubMed Central

    Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

    2015-01-01

    Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478

  2. Data in support of the identification of neuronal and astrocyte proteins interacting with extracellularly applied oligomeric and fibrillar α-synuclein assemblies by mass spectrometry

    PubMed Central

    Shrivastava, Amulya Nidhi; Redeker, Virginie; Fritz, Nicolas; Pieri, Laura; Almeida, Leandro G.; Spolidoro, Maria; Liebmann, Thomas; Bousset, Luc; Renner, Marianne; Léna, Clément; Aperia, Anita; Melki, Ronald; Triller, Antoine

    2016-01-01

    α-Synuclein (α-syn) is the principal component of Lewy bodies, the pathophysiological hallmark of individuals affected by Parkinson disease (PD). This neuropathologic form of α-syn contributes to PD progression and propagation of α-syn assemblies between neurons. The data we present here support the proteomic analysis used to identify neuronal proteins that specifically interact with extracellularly applied oligomeric or fibrillar α-syn assemblies (conditions 1 and 2, respectively) (doi: 10.15252/embj.201591397[1]). α-syn assemblies and their cellular partner proteins were pulled down from neuronal cell lysed shortly after exposure to exogenous α-syn assemblies and the associated proteins were identified by mass spectrometry using a shotgun proteomic-based approach. We also performed experiments on pure cultures of astrocytes to identify astrocyte-specific proteins interacting with oligomeric or fibrillar α-syn (conditions 3 and 4, respectively). For each condition, proteins interacting selectively with α-syn assemblies were identified by comparison to proteins pulled-down from untreated cells used as controls. The mass spectrometry data, the database search and the peak lists have been deposited to the ProteomeXchange Consortium database via the PRIDE partner repository with the dataset identifiers PRIDE: PXD002256 to PRIDE: PXD002263 and doi: 10.6019/PXD002256 to 10.6019/PXD002263. PMID:26958642

  3. Data in support of the identification of neuronal and astrocyte proteins interacting with extracellularly applied oligomeric and fibrillar α-synuclein assemblies by mass spectrometry.

    PubMed

    Shrivastava, Amulya Nidhi; Redeker, Virginie; Fritz, Nicolas; Pieri, Laura; Almeida, Leandro G; Spolidoro, Maria; Liebmann, Thomas; Bousset, Luc; Renner, Marianne; Léna, Clément; Aperia, Anita; Melki, Ronald; Triller, Antoine

    2016-06-01

    α-Synuclein (α-syn) is the principal component of Lewy bodies, the pathophysiological hallmark of individuals affected by Parkinson disease (PD). This neuropathologic form of α-syn contributes to PD progression and propagation of α-syn assemblies between neurons. The data we present here support the proteomic analysis used to identify neuronal proteins that specifically interact with extracellularly applied oligomeric or fibrillar α-syn assemblies (conditions 1 and 2, respectively) (doi: 10.15252/embj.201591397[1]). α-syn assemblies and their cellular partner proteins were pulled down from neuronal cell lysed shortly after exposure to exogenous α-syn assemblies and the associated proteins were identified by mass spectrometry using a shotgun proteomic-based approach. We also performed experiments on pure cultures of astrocytes to identify astrocyte-specific proteins interacting with oligomeric or fibrillar α-syn (conditions 3 and 4, respectively). For each condition, proteins interacting selectively with α-syn assemblies were identified by comparison to proteins pulled-down from untreated cells used as controls. The mass spectrometry data, the database search and the peak lists have been deposited to the ProteomeXchange Consortium database via the PRIDE partner repository with the dataset identifiers PRIDE: PXD002256 to PRIDE: PXD002263 and doi: 10.6019/PXD002256 to 10.6019/PXD002263.

  4. Achieving high confidence protein annotations in a sea of unknowns

    NASA Astrophysics Data System (ADS)

    Timmins-Schiffman, E.; May, D. H.; Noble, W. S.; Nunn, B. L.; Mikan, M.; Harvey, H. R.

    2016-02-01

    Increased sensitivity of mass spectrometry (MS) technology allows deep and broad insight into community functional analyses. Metaproteomics holds the promise to reveal functional responses of natural microbial communities, whereas metagenomics alone can only hint at potential functions. The complex datasets resulting from ocean MS have the potential to inform diverse realms of the biological, chemical, and physical ocean sciences, yet the extent of bacterial functional diversity and redundancy has not been fully explored. To take advantage of these impressive datasets, we need a clear bioinformatics pipeline for metaproteomics peptide identification and annotation with a database that can provide confident identifications. Researchers must consider whether it is sufficient to leverage the vast quantities of available ocean sequence data or if they must invest in site-specific metagenomic sequencing. We have sequenced, to our knowledge, the first western arctic metagenomes from the Bering Strait and the Chukchi Sea. We have addressed the long standing question: Is a metagenome required to accurately complete metaproteomics and assess the biological distribution of metabolic functions controlling nutrient acquisition in the ocean? Two different protein databases were constructed from 1) a site-specific metagenome and 2) subarctic/arctic groups available in NCBI's non-redundant database. Multiple proteomic search strategies were employed, against each individual database and against both databases combined, to determine the algorithm and approach that yielded the balance of high sensitivity and confident identification. Results yielded over 8200 confidently identified proteins. Our comparison of these results allows us to quantify the utility of investing resources in a metagenome versus using the constantly expanding and immediately available public databases for metaproteomic studies.

  5. Multiplexed Post-Experimental Monoisotopic Mass Refinement ( m PE-MMR) to Increase Sensitivity and Accuracy in Peptide Identifications from Tandem Mass Spectra of Cofragmentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madar, Inamul Hasan; Ko, Seung-Ik; Kim, Hokeun

    Mass spectrometry (MS)-based proteomics, which uses high-resolution hybrid mass spectrometers such as the quadrupole-orbitrap mass spectrometer, can yield tens of thousands of tandem mass (MS/MS) spectra of high resolution during a routine bottom-up experiment. Despite being a fundamental and key step in MS-based proteomics, the accurate determination and assignment of precursor monoisotopic masses to the MS/MS spectra remains difficult. The difficulties stem from imperfect isotopic envelopes of precursor ions, inaccurate charge states for precursor ions, and cofragmentation. We describe a composite method of utilizing MS data to assign accurate monoisotopic masses to MS/MS spectra, including those subject to cofragmentation. Themore » method, “multiplexed post-experiment monoisotopic mass refinement” (mPE-MMR), consists of the following: multiplexing of precursor masses to assign multiple monoisotopic masses of cofragmented peptides to the corresponding multiplexed MS/MS spectra, multiplexing of charge states to assign correct charges to the precursor ions of MS/ MS spectra with no charge information, and mass correction for inaccurate monoisotopic peak picking. When combined with MS-GF+, a database search algorithm based on fragment mass difference, mPE-MMR effectively increases both sensitivity and accuracy in peptide identification from complex high-throughput proteomics data compared to conventional methods.« less

  6. The porcine translational research database: A manually curated, genomics and proteomics-based research resource

    USDA-ARS?s Scientific Manuscript database

    The use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are...

  7. Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server.

    PubMed

    Vohradsky, Jiri; Janda, Ivan; Grünenfelder, Björn; Berndt, Peter; Röder, Daniel; Langen, Hanno; Weiser, Jaroslav; Jenal, Urs

    2003-10-01

    Here we present the Swiss-Czech Proteomics Server (SWICZ), which hosts the proteomic database summarizing information about the cell cycle of the aquatic bacterium Caulobacter crescentus. The database provides a searchable tool for easy access of global protein synthesis and protein stability data as examined during the C. crescentus cell cycle. Protein synthesis data collected from five different cell cycle stages were determined for each protein spot as a relative value of the total amount of [(35)S]methionine incorporation. Protein stability of pulse-labeled extracts were measured during a chase period equivalent to one cell cycle unit. Quantitative information for individual proteins together with descriptive data such as protein identities, apparent molecular masses and isoelectric points, were combined with information on protein function, genomic context, and the cell cycle stage, and were then assembled in a relational database with a world wide web interface (http://proteom.biomed.cas.cz), which allows the database records to be searched and displays the recovered information. A total of 1250 protein spots were reproducibly detected on two-dimensional gel electropherograms, 295 of which were identified by mass spectroscopy. The database is accessible either through clickable two-dimensional gel electrophoretic maps or by means of a set of dedicated search engines. Basic characterization of the experimental procedures, data processing, and a comprehensive description of the web site are presented. In its current state, the SWICZ proteome database provides a platform for the incorporation of new data emerging from extended functional studies on the C. crescentus proteome.

  8. Rice proteome analysis: a step toward functional analysis of the rice genome.

    PubMed

    Komatsu, Setsuko; Tanaka, Naoki

    2005-03-01

    The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.

  9. ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging

    PubMed Central

    Ma, Ze-Qiang; Chambers, Matthew C.; Ham, Amy-Joan L.; Cheek, Kristin L.; Whitwell, Corbin W.; Aerni, Hans-Rudolf; Schilling, Birgit; Miller, Aaron W.; Caprioli, Richard M.; Tabb, David L.

    2011-01-01

    In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu. PMID:21520941

  10. The Human Skeletal Muscle Proteome Project: a reappraisal of the current literature

    PubMed Central

    Gonzalez‐Freire, Marta; Semba, Richard D.; Ubaida‐Mohien, Ceereena; Fabbri, Elisa; Scalzo, Paul; Højlund, Kurt; Dufresne, Craig; Lyashkov, Alexey

    2016-01-01

    Abstract Skeletal muscle is a large organ that accounts for up to half the total mass of the human body. A progressive decline in muscle mass and strength occurs with ageing and in some individuals configures the syndrome of ‘sarcopenia’, a condition that impairs mobility, challenges autonomy, and is a risk factor for mortality. The mechanisms leading to sarcopenia as well as myopathies are still little understood. The Human Skeletal Muscle Proteome Project was initiated with the aim to characterize muscle proteins and how they change with ageing and disease. We conducted an extensive review of the literature and analysed publically available protein databases. A systematic search of peer‐reviewed studies was performed using PubMed. Search terms included ‘human’, ‘skeletal muscle’, ‘proteome’, ‘proteomic(s)’, and ‘mass spectrometry’, ‘liquid chromatography‐mass spectrometry (LC‐MS/MS)’. A catalogue of 5431 non‐redundant muscle proteins identified by mass spectrometry‐based proteomics from 38 peer‐reviewed scientific publications from 2002 to November 2015 was created. We also developed a nosology system for the classification of muscle proteins based on localization and function. Such inventory of proteins should serve as a useful background reference for future research on changes in muscle proteome assessed by quantitative mass spectrometry‐based proteomic approaches that occur with ageing and diseases. This classification and compilation of the human skeletal muscle proteome can be used for the identification and quantification of proteins in skeletal muscle to discover new mechanisms for sarcopenia and specific muscle diseases that can be targeted for the prevention and treatment. PMID:27897395

  11. The Escherichia coli Peripheral Inner Membrane Proteome*

    PubMed Central

    Papanastasiou, Malvina; Orfanoudaki, Georgia; Koukaki, Marina; Kountourakis, Nikos; Sardis, Marios Frantzeskos; Aivaliotis, Michalis; Karamanou, Spyridoula; Economou, Anastassios

    2013-01-01

    Biological membranes are essential for cell viability. Their functional characteristics strongly depend on their protein content, which consists of transmembrane (integral) and peripherally associated membrane proteins. Both integral and peripheral inner membrane proteins mediate a plethora of biological processes. Whereas transmembrane proteins have characteristic hydrophobic stretches and can be predicted using bioinformatics approaches, peripheral inner membrane proteins are hydrophilic, exist in equilibria with soluble pools, and carry no discernible membrane targeting signals. We experimentally determined the cytoplasmic peripheral inner membrane proteome of the model organism Escherichia coli using a multidisciplinary approach. Initially, we extensively re-annotated the theoretical proteome regarding subcellular localization using literature searches, manual curation, and multi-combinatorial bioinformatics searches of the available databases. Next we used sequential biochemical fractionations coupled to direct identification of individual proteins and protein complexes using high resolution mass spectrometry. We determined that the proposed cytoplasmic peripheral inner membrane proteome occupies a previously unsuspected ∼19% of the basic E. coli BL21(DE3) proteome, and the detected peripheral inner membrane proteome occupies ∼25% of the estimated expressed proteome of this cell grown in LB medium to mid-log phase. This value might increase when fleeting interactions, not studied here, are taken into account. Several proteins previously regarded as exclusively cytoplasmic bind membranes avidly. Many of these proteins are organized in functional or/and structural oligomeric complexes that bind to the membrane with multiple interactions. Identified proteins cover the full spectrum of biological activities, and more than half of them are essential. Our data suggest that the cytoplasmic proteome displays remarkably dynamic and extensive communication with biological membrane surfaces that we are only beginning to decipher. PMID:23230279

  12. Proteome Analysis of Cytoplasmatic and Plastidic β-Carotene Lipid Droplets in Dunaliella bardawil1[OPEN

    PubMed Central

    Davidi, Lital; Levin, Yishai; Ben-Dor, Shifra; Pick, Uri

    2015-01-01

    The halotolerant green alga Dunaliella bardawil is unique in that it accumulates under stress two types of lipid droplets: cytoplasmatic lipid droplets (CLD) and β-carotene-rich (βC) plastoglobuli. Recently, we isolated and analyzed the lipid and pigment compositions of these lipid droplets. Here, we describe their proteome analysis. A contamination filter and an enrichment filter were utilized to define core proteins. A proteome database of Dunaliella salina/D. bardawil was constructed to aid the identification of lipid droplet proteins. A total of 124 and 42 core proteins were identified in βC-plastoglobuli and CLD, respectively, with only eight common proteins. Dunaliella spp. CLD resemble cytoplasmic droplets from Chlamydomonas reinhardtii and contain major lipid droplet-associated protein and enzymes involved in lipid and sterol metabolism. The βC-plastoglobuli proteome resembles the C. reinhardtii eyespot and Arabidopsis (Arabidopsis thaliana) plastoglobule proteomes and contains carotene-globule-associated protein, plastid-lipid-associated protein-fibrillins, SOUL heme-binding proteins, phytyl ester synthases, β-carotene biosynthesis enzymes, and proteins involved in membrane remodeling/lipid droplet biogenesis: VESICLE-INDUCING PLASTID PROTEIN1, synaptotagmin, and the eyespot assembly proteins EYE3 and SOUL3. Based on these and previous results, we propose models for the biogenesis of βC-plastoglobuli and the biosynthesis of β-carotene within βC-plastoglobuli and hypothesize that βC-plastoglobuli evolved from eyespot lipid droplets. PMID:25404729

  13. Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification.

    PubMed

    Khatun, Jainab; Hamlett, Eric; Giddings, Morgan C

    2008-03-01

    The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.

  14. The path to enlightenment: making sense of genomic and proteomic information.

    PubMed

    Maurer, Martin H

    2004-05-01

    Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.

  15. Proteome reference maps of Medicago truncatula embryogenic cell cultures generated from single protoplasts.

    PubMed

    Imin, Nijat; De Jong, Femke; Mathesius, Ulrike; van Noorden, Giel; Saeed, Nasir A; Wang, Xin-Ding; Rose, Ray J; Rolfe, Barry G

    2004-07-01

    Using a combination of two-dimensional gel electrophoresis (2-DE) protein mapping and mass spectrometry (MS) analysis, we have established proteome reference maps of Medicago truncatula embryogenic tissue culture cells. The cultures were generated from single protoplasts, which provided a relatively homogeneous cell population. We used these to analyze protein expression at the globular stages of somatic embryogenesis, which is the earliest morphogenetic embryonic stage. Over 3000 proteins could reproducibly be resolved over a pI range of 4-11. Three hundred and twelve protein spots were extracted from colloidal Coomassie Blue-stained 2-DE gels and analyzed by matrix-assisted laser desorption/ionization-time of flight MS analysis and tandem MS sequencing. This enabled the identification of 169 protein spots representing 128 unique gene products using a publicly available expressed sequence tag database and the MASCOT search engine. These reference maps will be valuable for the investigation of the molecular events which occur during somatic embryogenesis in M. truncatula. The proteome reference maps and supplementary materials will be available and updated for public access at http://semele.anu.edu.au/.

  16. In-depth proteomic analysis of a mollusc shell: acid-soluble and acid-insoluble matrix of the limpet Lottia gigantea

    PubMed Central

    2012-01-01

    Background Invertebrate biominerals are characterized by their extraordinary functionality and physical properties, such as strength, stiffness and toughness that by far exceed those of the pure mineral component of such composites. This is attributed to the organic matrix, secreted by specialized cells, which pervades and envelops the mineral crystals. Despite the obvious importance of the protein fraction of the organic matrix, only few in-depth proteomic studies have been performed due to the lack of comprehensive protein sequence databases. The recent public release of the gastropod Lottia gigantea genome sequence and the associated protein sequence database provides for the first time the opportunity to do a state-of-the-art proteomic in-depth analysis of the organic matrix of a mollusc shell. Results Using three different sodium hypochlorite washing protocols before shell demineralization, a total of 569 proteins were identified in Lottia gigantea shell matrix. Of these, 311 were assembled in a consensus proteome comprising identifications contained in all proteomes irrespective of shell cleaning procedure. Some of these proteins were similar in amino acid sequence, amino acid composition, or domain structure to proteins identified previously in different bivalve or gastropod shells, such as BMSP, dermatopontin, nacrein, perlustrin, perlucin, or Pif. In addition there were dozens of previously uncharacterized proteins, many containing repeated short linear motifs or homorepeats. Such proteins may play a role in shell matrix construction or control of mineralization processes. Conclusions The organic matrix of Lottia gigantea shells is a complex mixture of proteins comprising possible homologs of some previously characterized mollusc shell proteins, but also many novel proteins with a possible function in biomineralization as framework building blocks or as regulatory components. We hope that this data set, the most comprehensive available at present, will provide a platform for the further exploration of biomineralization processes in molluscs. PMID:22540284

  17. Proteome analysis of plastids from developing seeds of Jatropha curcas L.

    PubMed

    Pinheiro, Camila B; Shah, Mohibullah; Soares, Emanoella L; Nogueira, Fábio C S; Carvalho, Paulo C; Junqueira, Magno; Araújo, Gabriel D T; Soares, Arlete A; Domont, Gilberto B; Campos, Francisco A P

    2013-11-01

    In this study, we performed a proteomic analysis of plastids isolated from the endosperm of developing Jatropha curcas seeds that were in the initial stage of deposition of protein and lipid reserves. Proteins extracted from the plastids were digested with trypsin, and the peptides were applied to an EASY-nano LC system coupled inline to an ESI-LTQ-Orbitrap Velos mass spectrometer, and this led to the identification of 1103 proteins representing 804 protein groups, of which 923 proteins were considered as true identifications, and this considerably expands the repertoire of J. curcas proteins identified so far. Of the identified proteins, only five are encoded in the plastid genome, and none of them are involved in photosynthesis, evidentiating the nonphotosynthetic nature of the isolated plastids. Homologues for 824 out of 923 identified proteins were present in PPDB, SUBA, or PlProt databases while homologues for 13 proteins were not found in any of the three plastid proteins databases but were marked as plastidial by at least one of the three prediction programs used. Functional classification showed that proteins belonging to amino acids metabolism comprise the main functional class, followed by carbohydrate, energy, and lipid metabolisms. The small and large subunits of Rubisco were identified, and their presence in the plastids is considered to be an adaptive feature counterbalancing for the loss of one-third of the carbon as CO2 as a result of the conversion of carbohydrate to oil through glycolysis. While several enzymes involved in the biosynthesis of several precursors of diterpenoids were identified, we were unable to identify any terpene synthase/cyclase, which suggests that the plastids isolated from the endosperm of developing seeds do not synthesize phorbol esters. In conclusion, our study provides insights into the major biosynthetic pathways and certain unique features of the plastids from the endosperm of developing seeds at the whole proteome level.

  18. Resources for Functional Genomics Studies in Drosophila melanogaster

    PubMed Central

    Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert

    2014-01-01

    Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003

  19. Comet: an open-source MS/MS sequence database search tool.

    PubMed

    Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

    2013-01-01

    Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. MixGF: spectral probabilities for mixture spectra from more than one peptide.

    PubMed

    Wang, Jian; Bourne, Philip E; Bandeira, Nuno

    2014-12-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  1. MixGF: Spectral Probabilities for Mixture Spectra from more than One Peptide*

    PubMed Central

    Wang, Jian; Bourne, Philip E.; Bandeira, Nuno

    2014-01-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. PMID:25225354

  2. Yeast proteome map (last update).

    PubMed

    Perrot, Michel; Moes, Suzette; Massoni, Aurélie; Jenoe, Paul; Boucherie, Hélian

    2009-10-01

    The identification of proteins separated on 2-D gels is essential to exploit the full potential of 2-D gel electrophoresis for proteomic investigations. For this purpose we have undertaken the systematic identification of Saccharomyces cerevisiae proteins separated on 2-D gels. We report here the identification by mass spectrometry of 100 novel yeast protein spots that have so far not been tackled due to their scarcity on our standard 2-D gels. These identifications extend the number of protein spots identified on our yeast 2-D proteome map to 716. They correspond to 485 unique proteins. Among these, 154 were resolved into several isoforms. The present data set can now be expanded to report for the first time a map of 363 protein isoforms that significantly deepens our knowledge of the yeast proteome. The reference map and a list of all identified proteins can be accessed on the Yeast Protein Map server (www.ibgc.u-bordeaux2.fr/YPM).

  3. Genomic identification of potential targets unique to Candida albicans for the discovery of antifungal agents.

    PubMed

    Tripathi, Himanshu; Luqman, Suaib; Meena, Abha; Khan, Feroz

    2014-01-01

    Despite of modern antifungal therapy, the mortality rates of invasive infection with human fungal pathogen Candida albicans are up to 40%. Studies suggest that drug resistance in the three most common species of human fungal pathogens viz., C. albicans, Aspergillus fumigatus (causing mortality rate up to 90%) and Cryptococcus neoformans (causing mortality rate up to 70%) is due to mutations in the target enzymes or high expression of drug transporter genes. Drug resistance in human fungal pathogens has led to an imperative need for the identification of new targets unique to fungal pathogens. In the present study, we have used a comparative genomics approach to find out potential target proteins unique to C. albicans, an opportunistic fungus responsible for severe infection in immune-compromised human. Interestingly, many target proteins of existing antifungal agents showed orthologs in human cells. To identify unique proteins, we have compared proteome of C. albicans [SC5314] i.e., 14,633 total proteins retrieved from the RefSeq database of NCBI, USA with proteome of human and non-pathogenic yeast Saccharomyces cerevisiae. Results showed that 4,568 proteins were identified unique to C. albicans as compared to those of human and later when these unique proteins were compared with S. cerevisiae proteome, finally 2,161 proteins were identified as unique proteins and after removing repeats total 1,618 unique proteins (42 functionally known, 1,566 hypothetical and 10 unknown) were selected as potential antifungal drug targets unique to C. albicans.

  4. Chernobyl seed project. Advances in the identification of differentially abundant proteins in a radio-contaminated environment.

    PubMed

    Rashydov, Namik M; Hajduch, Martin

    2015-01-01

    Plants have the ability to grow and successfully reproduce in radio-contaminated environments, which has been highlighted by nuclear accidents at Chernobyl (1986) and Fukushima (2011). The main aim of this article is to summarize the advances of the Chernobyl seed project which has the purpose to provide proteomic characterization of plants grown in the Chernobyl area. We present a summary of comparative proteomic studies on soybean and flax seeds harvested from radio-contaminated Chernobyl areas during two successive generations. Using experimental design developed for radio-contaminated areas, altered abundances of glycine betaine, seed storage proteins, and proteins associated with carbon assimilation into fatty acids were detected. Similar studies in Fukushima radio-contaminated areas might complement these data. The results from these Chernobyl experiments can be viewed in a user-friendly format at a dedicated web-based database freely available at http://www.chernobylproteomics.sav.sk.

  5. Chernobyl seed project. Advances in the identification of differentially abundant proteins in a radio-contaminated environment

    PubMed Central

    Rashydov, Namik M.; Hajduch, Martin

    2015-01-01

    Plants have the ability to grow and successfully reproduce in radio-contaminated environments, which has been highlighted by nuclear accidents at Chernobyl (1986) and Fukushima (2011). The main aim of this article is to summarize the advances of the Chernobyl seed project which has the purpose to provide proteomic characterization of plants grown in the Chernobyl area. We present a summary of comparative proteomic studies on soybean and flax seeds harvested from radio-contaminated Chernobyl areas during two successive generations. Using experimental design developed for radio-contaminated areas, altered abundances of glycine betaine, seed storage proteins, and proteins associated with carbon assimilation into fatty acids were detected. Similar studies in Fukushima radio-contaminated areas might complement these data. The results from these Chernobyl experiments can be viewed in a user-friendly format at a dedicated web-based database freely available at http://www.chernobylproteomics.sav.sk. PMID:26217350

  6. Global Membrane Protein Interactome Analysis using In vivo Crosslinking and Mass Spectrometry-based Protein Correlation Profiling*

    PubMed Central

    Larance, Mark; Kirkwood, Kathryn J.; Tinti, Michele; Brenes Murillo, Alejandro; Ferguson, Michael A. J.; Lamond, Angus I.

    2016-01-01

    We present a methodology using in vivo crosslinking combined with HPLC-MS for the global analysis of endogenous protein complexes by protein correlation profiling. Formaldehyde crosslinked protein complexes were extracted with high yield using denaturing buffers that maintained complex solubility during chromatographic separation. We show this efficiently detects both integral membrane and membrane-associated protein complexes,in addition to soluble complexes, allowing identification and analysis of complexes not accessible in native extracts. We compare the protein complexes detected by HPLC-MS protein correlation profiling in both native and formaldehyde crosslinked U2OS cell extracts. These proteome-wide data sets of both in vivo crosslinked and native protein complexes from U2OS cells are freely available via a searchable online database (www.peptracker.com/epd). Raw data are also available via ProteomeXchange (identifier PXD003754). PMID:27114452

  7. A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data.

    PubMed

    Mo, Fan; Hong, Xu; Gao, Feng; Du, Lin; Wang, Jun; Omenn, Gilbert S; Lin, Biaoyang

    2008-12-16

    Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched. We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events. Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

  8. LFQuant: a label-free fast quantitative analysis tool for high-resolution LC-MS/MS proteomics data.

    PubMed

    Zhang, Wei; Zhang, Jiyang; Xu, Changming; Li, Ning; Liu, Hui; Ma, Jie; Zhu, Yunping; Xie, Hongwei

    2012-12-01

    Database searching based methods for label-free quantification aim to reconstruct the peptide extracted ion chromatogram based on the identification information, which can limit the search space and thus make the data processing much faster. The random effect of the MS/MS sampling can be remedied by cross-assignment among different runs. Here, we present a new label-free fast quantitative analysis tool, LFQuant, for high-resolution LC-MS/MS proteomics data based on database searching. It is designed to accept raw data in two common formats (mzXML and Thermo RAW), and database search results from mainstream tools (MASCOT, SEQUEST, and X!Tandem), as input data. LFQuant can handle large-scale label-free data with fractionation such as SDS-PAGE and 2D LC. It is easy to use and provides handy user interfaces for data loading, parameter setting, quantitative analysis, and quantitative data visualization. LFQuant was compared with two common quantification software packages, MaxQuant and IDEAL-Q, on the replication data set and the UPS1 standard data set. The results show that LFQuant performs better than them in terms of both precision and accuracy, and consumes significantly less processing time. LFQuant is freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/lfquant/. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data.

    PubMed

    Cole, Charles; Krampis, Konstantinos; Karagiannis, Konstantinos; Almeida, Jonas S; Faison, William J; Motwani, Mona; Wan, Quan; Golikov, Anton; Pan, Yang; Simonyan, Vahan; Mazumder, Raja

    2014-01-27

    Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.

  10. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

    PubMed Central

    2014-01-01

    Background Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. Results To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). Conclusions Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides. PMID:24467687

  11. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.

    PubMed

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.

  12. SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

    PubMed Central

    Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V. Lila; Karagouni, Amalia D.; Tsakalidis, Athanasios; Kossida, Sophia

    2012-01-01

    Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality. PMID:22267904

  13. Proteomics of exhaled breath: methodological nuances and pitfalls.

    PubMed

    Kurova, Viktoria S; Anaev, Eldar C; Kononikhin, Alexey S; Fedorchenko, Kristina Yu; Popov, Igor A; Kalupov, Timothey L; Bratanov, Dmitriy O; Nikolaev, Eugenie N; Varfolomeev, Sergey D

    2009-01-01

    The analysis of exhaled breath condensate (EBC) can be an alternative to traditional endoscopic sampling of lower respiratory tract secretions. This is a simple non-invasive method of diagnosing respiratory diseases, in particular, respiratory inflammatory processes. Samples were collected with a special device-condenser (ECoScreen, VIASYS Healthcare, Germany), then treated with trypsin according to the proteomics protocol for standard protein mixtures and analyzed by nanoflow high-performance liquid chromatography tandem mass spectrometry (HPLC-MS/MS) with a 7-Tesla Finnigan LTQ-FT mass spectrometer (Thermo Electron, Germany). Mascot software (Matrixscience) was used for screening the database NCBInr for proteins corresponding to the peptide maps that were obtained. EBCs from 17 young healthy non-smoking donors were collected. Different methods for concentrating protein were compared in order to optimize EBC preparations for proteomic analysis. The procedure that was chosen allowed identification of proteins exhaled by healthy people. The major proteins in the condensates were cytoskeletal keratins. Another 12 proteins were identified in EBC from healthy non-smokers. Some keratins were found in the ambient air and may be considered exogenous components of exhaled air. Knowledge of the normal proteome of exhaled breath allows one to look for biomarkers of different disease states in EBC. Proteins in ambient air can be identified in the respiratory tract and should be excluded from the analysis of the proteome of EBC. The results obtained allowed us to choose the most effective procedure of sample preparation when working with samples containing very low protein concentrations.

  14. Proteomic identification of fat-browning markers in cultured white adipocytes treated with curcumin.

    PubMed

    Kim, Sang Woo; Choi, Jae Heon; Mukherjee, Rajib; Hwang, Ki-Chul; Yun, Jong Won

    2016-04-01

    We previously reported that curcumin induces browning of primary white adipocytes via enhanced expression of brown adipocyte-specific genes. In this study, we attempted to identify target proteins responsible for this fat-browning effect by analyzing proteomic changes in cultured white adipocytes in response to curcumin treatment. To elucidate the role of curcumin in fat-browning, we conducted comparative proteomic analysis of primary adipocytes between control and curcumin-treated cells using two-dimensional electrophoresis combined with MALDI-TOF-MS. We also investigated fatty acid metabolic targets, mitochondrial biogenesis, and fat-browning-associated proteins using combined proteomic and network analyses. Proteomic analysis revealed that 58 protein spots from a total of 325 matched spots showed differential expression between control and curcumin-treated adipocytes. Using network analysis, most of the identified proteins were proven to be involved in various metabolic and cellular processes based on the PANTHER classification system. One of the most striking findings is that hormone-sensitive lipase (HSL) was highly correlated with main browning markers based on the STRING database. HSL and two browning markers (UCP1, PGC-1α) were co-immunoprecipitated with these markers, suggesting that HSL possibly plays a role in fat-browning of white adipocytes. Our results suggest that curcumin increased HSL levels and other browning-specific markers, suggesting its possible role in augmentation of lipolysis and suppression of lipogenesis by trans-differentiation from white adipocytes into brown adipocytes (beige).

  15. Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

    PubMed

    Tu, Chengjian; Sheng, Quanhu; Li, Jun; Ma, Danjun; Shen, Xiaomeng; Wang, Xue; Shyr, Yu; Yi, Zhengping; Qu, Jun

    2015-11-06

    The two key steps for analyzing proteomic data generated by high-resolution MS are database searching and postprocessing. While the two steps are interrelated, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here, we investigated the performance of three popular search engines (SEQUEST, Mascot, and MS Amanda) in conjunction with five filtering approaches, including respective score-based filtering, a group-based approach, local false discovery rate (LFDR), PeptideProphet, and Percolator. A total of eight data sets from various proteomes (e.g., E. coli, yeast, and human) produced by various instruments with high-accuracy survey scan (MS1) and high- or low-accuracy fragment ion scan (MS2) (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion, and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide and protein identifications at the same FDR level than the other 12 combinations for all data sets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances for data sets with low-accuracy MS2 (ion trap or IT) and high accuracy MS2 (Orbitrap or TOF), respectively, than did other methods. For approaches without Percolator, SEQUEST-group performs the best for data sets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for data sets generated by higher-energy collisional dissociation (HCD) and analyzed in Orbitrap (HCD-OT) and in Orbitrap Fusion (HCD-IT); MS Amanda-Group excels for the Q-TOF data set and the Orbitrap Velos HCD-OT data set. Therefore, if Percolator was not used, a specific combination should be applied for each type of data set. Moreover, a higher percentage of multiple-peptide proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore, Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold, and an in-house algorithm (BuildSummary). These results provide valuable guidelines for the optimal interpretation of proteomic results and the development of fit-for-purpose protocols under different situations.

  16. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

    PubMed Central

    Jones, Andrew R.; Siepen, Jennifer A.; Hubbard, Simon J.; Paton, Norman W.

    2010-01-01

    Tandem mass spectrometry, run in combination with liquid chromatography (LC-MS/MS), can generate large numbers of peptide and protein identifications, for which a variety of database search engines are available. Distinguishing correct identifications from false positives is far from trivial because all data sets are noisy, and tend to be too large for manual inspection, therefore probabilistic methods must be employed to balance the trade-off between sensitivity and specificity. Decoy databases are becoming widely used to place statistical confidence in results sets, allowing the false discovery rate (FDR) to be estimated. It has previously been demonstrated that different MS search engines produce different peptide identification sets, and as such, employing more than one search engine could result in an increased number of peptides being identified. However, such efforts are hindered by the lack of a single scoring framework employed by all search engines. We have developed a search engine independent scoring framework based on FDR which allows peptide identifications from different search engines to be combined, called the FDRScore. We observe that peptide identifications made by three search engines are infrequently false positives, and identifications made by only a single search engine, even with a strong score from the source search engine, are significantly more likely to be false positives. We have developed a second score based on the FDR within peptide identifications grouped according to the set of search engines that have made the identification, called the combined FDRScore. We demonstrate by searching large publicly available data sets that the combined FDRScore can differentiate between between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine. PMID:19253293

  17. YahO protein as a calibrant for top-down proteomic identification of Shiga toxin using MALDI-TOF-TOF-MS/MS and post-source decay

    USDA-ARS?s Scientific Manuscript database

    Matrix-assisted laser desorption/ionization tandem time-of-flight (MALDI-TOF-TOF) mass spectrometry is increasingly utilized for rapid top-down proteomic identification of proteins. This identification may involve analysis of either a pure protein or a protein mixture. For analysis of a pure protein...

  18. Optimization of parameters for coverage of low molecular weight proteins

    PubMed Central

    Müller, Stephan A.; Kohajda, Tibor; Findeiß, Sven; Stadler, Peter F.; Washietl, Stefan; Kellis, Manolis; von Bergen, Martin

    2010-01-01

    Proteins with molecular weights of <25 kDa are involved in major biological processes such as ribosome formation, stress adaption (e.g., temperature reduction) and cell cycle control. Despite their importance, the coverage of smaller proteins in standard proteome studies is rather sparse. Here we investigated biochemical and mass spectrometric parameters that influence coverage and validity of identification. The underrepresentation of low molecular weight (LMW) proteins may be attributed to the low numbers of proteolytic peptides formed by tryptic digestion as well as their tendency to be lost in protein separation and concentration/desalting procedures. In a systematic investigation of the LMW proteome of Escherichia coli, a total of 455 LMW proteins (27% of the 1672 listed in the SwissProt protein database) were identified, corresponding to a coverage of 62% of the known cytosolic LMW proteins. Of these proteins, 93 had not yet been functionally classified, and five had not previously been confirmed at the protein level. In this study, the influences of protein extraction (either urea or TFA), proteolytic digestion (solely, and the combined usage of trypsin and AspN as endoproteases) and protein separation (gel- or non-gel-based) were investigated. Compared to the standard procedure based solely on the use of urea lysis buffer, in-gel separation and tryptic digestion, the complementary use of TFA for extraction or endoprotease AspN for proteolysis permits the identification of an extra 72 (32%) and 51 proteins (23%), respectively. Regarding mass spectrometry analysis with an LTQ Orbitrap mass spectrometer, collision-induced fragmentation (CID and HCD) and electron transfer dissociation using the linear ion trap (IT) or the Orbitrap as the analyzer were compared. IT-CID was found to yield the best identification rate, whereas IT-ETD provided almost comparable results in terms of LMW proteome coverage. The high overlap between the proteins identified with IT-CID and IT-ETD allowed the validation of 75% of the identified proteins using this orthogonal fragmentation technique. Furthermore, a new approach to evaluating and improving the completeness of protein databases that utilizes the program RNAcode was introduced and examined. Electronic supplementary material The online version of this article (doi:10.1007/s00216-010-4093-x) contains supplementary material, which is available to authorized users. PMID:20803007

  19. Unlocking the proteomic information encoded in MALDI-TOF-MS data used for microbial identification and characterization.

    PubMed

    Fagerquist, Clifton K

    2017-01-01

    Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) is increasingly utilized as a rapid technique to identify microorganisms including pathogenic bacteria. However, little attention has been paid to the significant proteomic information encoded in the MS peaks that collectively constitute the MS 'fingerprint'. This review/perspective is intended to explore this topic in greater detail in the hopes that it may spur interest and further research in this area. Areas covered: This paper examines the recent literature on utilizing MALDI-TOF for bacterial identification. Critical works highlighting protein biomarker identification of bacteria, arguments for and against protein biomarker identification, proteomic approaches to biomarker identification, emergence of MALDI-TOF-TOF platforms and their use for top-down proteomic identification of bacterial proteins, protein denaturation and its effect on protein ion fragmentation, collision cross-sections and energy deposition during desorption/ionization are also explored. Expert commentary: MALDI-TOF and TOF-TOF mass spectrometry platforms will continue to provide chemical analyses that are rapid, cost-effective and high throughput. These instruments have proven their utility in the taxonomic identification of pathogenic bacteria at the genus and species level and are poised to more fully characterize these microorganisms to the benefit of clinical microbiology, food safety and other fields.

  20. pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification.

    PubMed

    Liu, Ming-Qi; Zeng, Wen-Feng; Fang, Pan; Cao, Wei-Qian; Liu, Chao; Yan, Guo-Quan; Zhang, Yang; Peng, Chao; Wu, Jian-Qiang; Zhang, Xiao-Jin; Tu, Hui-Jun; Chi, Hao; Sun, Rui-Xiang; Cao, Yong; Dong, Meng-Qiu; Jiang, Bi-Yun; Huang, Jiang-Ming; Shen, Hua-Li; Wong, Catherine C L; He, Si-Min; Yang, Peng-Yuan

    2017-09-05

    The precise and large-scale identification of intact glycopeptides is a critical step in glycoproteomics. Owing to the complexity of glycosylation, the current overall throughput, data quality and accessibility of intact glycopeptide identification lack behind those in routine proteomic analyses. Here, we propose a workflow for the precise high-throughput identification of intact N-glycopeptides at the proteome scale using stepped-energy fragmentation and a dedicated search engine. pGlyco 2.0 conducts comprehensive quality control including false discovery rate evaluation at all three levels of matches to glycans, peptides and glycopeptides, improving the current level of accuracy of intact glycopeptide identification. The N-glycoproteome of samples metabolically labeled with 15 N/ 13 C were analyzed quantitatively and utilized to validate the glycopeptide identification, which could be used as a novel benchmark pipeline to compare different search engines. Finally, we report a large-scale glycoproteome dataset consisting of 10,009 distinct site-specific N-glycans on 1988 glycosylation sites from 955 glycoproteins in five mouse tissues.Protein glycosylation is a heterogeneous post-translational modification that generates greater proteomic diversity that is difficult to analyze. Here the authors describe pGlyco 2.0, a workflow for the precise one step identification of intact N-glycopeptides at the proteome scale.

  1. Proteomics data exchange and storage: the need for common standards and public repositories.

    PubMed

    Jiménez, Rafael C; Vizcaíno, Juan Antonio

    2013-01-01

    Both the existence of data standards and public databases or repositories have been key factors behind the development of the existing "omics" approaches. In this book chapter we first review the main existing mass spectrometry (MS)-based proteomics resources: PRIDE, PeptideAtlas, GPMDB, and Tranche. Second, we report on the current status of the different proteomics data standards developed by the Proteomics Standards Initiative (PSI): the formats mzML, mzIdentML, mzQuantML, TraML, and PSI-MI XML are then reviewed. Finally, we present an easy way to query and access MS proteomics data in the PRIDE database, as a representative of the existing repositories, using the workflow management system (WMS) tool Taverna. Two different publicly available workflows are explained and described.

  2. Proteomic Identification of Monoclonal Antibodies from Serum

    PubMed Central

    2015-01-01

    Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between “true” and “false” identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases. PMID:24684310

  3. Proteomics meets blood banking: identification of protein targets for the improvement of platelet quality.

    PubMed

    Schubert, Peter; Devine, Dana V

    2010-01-03

    Proteomics has brought new perspectives to the fields of hematology and transfusion medicine in the last decade. The steady improvement of proteomic technology is propelling novel discoveries of molecular mechanisms by studying protein expression, post-translational modifications and protein interactions. This review article focuses on the application of proteomics to the identification of molecular mechanisms leading to the deterioration of blood platelets during storage - a critical aspect in the provision of platelet transfusion products. Several proteomic approaches have been employed to analyse changes in the platelet protein profile during storage and the obtained data now need to be translated into platelet biochemistry in order to connect the results to platelet function. Targeted biochemical applications then allow the identification of points for intervention in signal transduction pathways. Once validated and placed in a transfusion context, these data will provide further understanding of the underlying molecular mechanisms leading to platelet storage lesion. Future aspects of proteomics in blood banking will aim to make use of protein markers identified for platelet storage lesion development to monitor proteome changes when alterations such as the use of additive solutions or pathogen reduction strategies are put in place in order to improve platelet quality for patients. (c) 2009 Elsevier B.V. All rights reserved.

  4. Proteomic identification of rhythmic proteins in rice seedlings.

    PubMed

    Hwang, Heeyoun; Cho, Man-Ho; Hahn, Bum-Soo; Lim, Hyemin; Kwon, Yong-Kook; Hahn, Tae-Ryong; Bhoo, Seong Hee

    2011-04-01

    Many aspects of plant metabolism that are involved in plant growth and development are influenced by light-regulated diurnal rhythms as well as endogenous clock-regulated circadian rhythms. To identify the rhythmic proteins in rice, periodically grown (12h light/12h dark cycle) seedlings were harvested for three days at six-hour intervals. Continuous dark-adapted plants were also harvested for two days. Among approximately 3000 reproducible protein spots on each gel, proteomic analysis ascertained 354 spots (~12%) as light-regulated rhythmic proteins, in which 53 spots showed prolonged rhythm under continuous dark conditions. Of these 354 ascertained rhythmic protein spots, 74 diurnal spots and 10 prolonged rhythmic spots under continuous dark were identified by MALDI-TOF MS analysis. The rhythmic proteins were functionally classified into photosynthesis, central metabolism, protein synthesis, nitrogen metabolism, stress resistance, signal transduction and unknown. Comparative analysis of our proteomic data with the public microarray database (the Plant DIURNAL Project) and RT-PCR analysis of rhythmic proteins showed differences in rhythmic expression phases between mRNA and protein, suggesting that the clock-regulated proteins in rice are modulated by not only transcriptional but also post-transcriptional, translational, and/or post-translational processes. 2011 Elsevier B.V. All rights reserved.

  5. PrionScan: an online database of predicted prion domains in complete proteomes.

    PubMed

    Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier

    2014-02-05

    Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.

  6. Post-genomics of microsporidia, with emphasis on a model of minimal eukaryotic proteome: a review.

    PubMed

    Texier, Catherine; Brosson, Damien; El Alaoui, Hicham; Méténier, Guy; Vivarès, Christian P

    2005-05-01

    The genome sequence of the microsporidian parasite Encephalitozoon cuniculi Levaditi, Nicolau et Schoen, 1923 contains about 2,000 genes that are representative of a non-redundant potential proteome composed of 1,909 protein chains. The purpose of this review is to relate some advances in the characterisation of this proteome through bioinformatics and experimental approaches. The reduced diversity of the set of E. cuniculi proteins is perceptible in all the compilations of predicted domains, orthologs, families and superfamilies, available in several public databases. The phyletic patterns of orthologs for seven eukaryotic organisms support an extensive gene loss in the fungal clade, with additional deletions in E. cuniculi. Most microsporidial orthologs are the smallest ones among eukaryotes, justifying an interest in the use of these compacted proteins to better discriminate between essential and non-essential regions. The three components of the E. cuniculi mRNA capping apparatus have been especially well characterized and the three-dimensional structure of the cap methyltransferase has been elucidated following the crystallisation of the microsporidial enzyme Ecm1. So far, our mass spectrometry-based analyses of the E. cuniculi spore proteome has led to the identification of about 170 proteins, one-quarter of these having no clearly predicted function. Immunocytochemical studies are in progress to determine the subcellular localisation of microsporidia-specific proteins. Post-translational modifications such as phosphorylation and glycosylation are expected to be soon explored.

  7. Proteome analysis of Aspergillus ochraceus.

    PubMed

    Rizwan, Muhammad; Miller, Ingrid; Tasneem, Fareeha; Böhm, Josef; Gemeiner, Manfred; Razzazi-Fazeli, Ebrahim

    2010-08-01

    Genome sequencing for many important fungi has begun during recent years; however, there is still some deficiency in proteome profiling of aspergilli. To obtain a comprehensive overview of proteins and their expression, a proteomic approach based on 2D gel electrophoresis and MALDI-TOF/TOF mass spectrometry was used to investigate A. ochraceus. The cell walls of fungi are exceptionally resistant to destruction, therefore two lysis protocols were tested: (1) lysis via manual grinding using liquid nitrogen, and (2) mechanical lysis via rapid agitation with glass beads using MagNalyser. Mechanical grinding with mortar and pestle using liquid nitrogen was found to be a more efficient extraction method for our purpose, resulting in extracts with higher protein content and a clear band pattern in SDS-PAGE. Two-dimensional electrophoresis gave a complex spot pattern comprising proteins of a broad range of isoelectric points and molecular masses. The most abundant spots were subjected to mass spectrometric analysis. We could identify 31 spots representing 26 proteins, most of them involved in metabolic processes and response to stress. Seventeen spots were identified by de novo sequencing due to a lack of DNA and protein database sequences of A. ochraceus. The proteins identified in our study have been reported for the first time in A. ochraceus and this represents the first proteomic approach with identification of major proteins, when the fungus was grown under submerged culture.

  8. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  9. Proteomic analysis of pollination-induced corolla senescence in petunia.

    PubMed

    Bai, Shuangyi; Willard, Belinda; Chapin, Laura J; Kinter, Michael T; Francis, David M; Stead, Anthony D; Jones, Michelle L

    2010-02-01

    Senescence represents the last phase of petal development during which macromolecules and organelles are degraded and nutrients are recycled to developing tissues. To understand better the post-transcriptional changes regulating petal senescence, a proteomic approach was used to profile protein changes during the senescence of Petuniaxhybrida 'Mitchell Diploid' corollas. Total soluble proteins were extracted from unpollinated petunia corollas at 0, 24, 48, and 72 h after flower opening and at 24, 48, and 72 h after pollination. Two-dimensional gel electrophoresis (2-DE) was used to identify proteins that were differentially expressed in non-senescing (unpollinated) and senescing (pollinated) corollas, and image analysis was used to determine which proteins were up- or down-regulated by the experimentally determined cut-off of 2.1-fold for P <0.05. One hundred and thirty-three differentially expressed protein spots were selected for sequencing. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to determine the identity of these proteins. Searching translated EST databases and the NCBI non-redundant protein database, it was possible to assign a putative identification to greater than 90% of these proteins. Many of the senescence up-regulated proteins were putatively involved in defence and stress responses or macromolecule catabolism. Some proteins, not previously characterized during flower senescence, were identified, including an orthologue of the tomato abscisic acid stress ripening protein 4 (ASR4). Gene expression patterns did not always correlate with protein expression, confirming that both proteomic and genomic approaches will be required to obtain a detailed understanding of the regulation of petal senescence.

  10. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    PubMed Central

    Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

    2007-01-01

    Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328

  11. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  12. Proteomic sensitivity to dietary manipulations in rainbow trout.

    PubMed

    Martin, S A M; Vilhelmsson, O; Médale, F; Watt, P; Kaushik, S; Houlihan, D F

    2003-09-23

    Changes in dietary protein sources due to substitution of fish meal by other protein sources can have metabolic consequences in farmed fish. A proteomics approach was used to study the protein profiles of livers of rainbow trout that have been fed two diets containing different proportions of plant ingredients. Both diets control (C) and soy (S) contained fish meal and plant ingredients and synthetic amino acids, but diet S had a greater proportion of soybean meal. A feeding trial was performed for 12 weeks at the end of which, growth and protein metabolism parameters were measured. Protein growth rates were not different in fish fed different diets; however, protein consumption and protein synthesis rates were higher in the fish fed the diet S. Fish fed diet S had lower efficiency of retention of synthesised protein. Ammonia excretion was increased as well as the activities of hepatic glutamate dehydrogenase and aspartate amino transferase (ASAT). No differences were found in free amino acid pools in either liver or muscle between diets. Protein extraction followed by high-resolution two-dimensional electrophoresis, coupled with gel image analysis, allowed identification and expression of hundreds of protein. Individual proteins of interest were then subjected to further analysis leading to protein identification by trypsin digest fingerprinting. During this study, approximately 800 liver proteins were analysed for expression pattern, of which 33 were found to be differentially expressed between diets C and S. Seventeen proteins were positively identified after database searching. Proteins were identified from diverse metabolic pathways, demonstrating the complex nature of gene expression responses to dietary manipulation revealed by proteomic characterisation.

  13. Microbial Protein-Antigenome Determination (MAD) Technology: A Proteomics-Based Strategy for Rapid Identification of Microbial Targets of Host Humoral Immune Responses

    USDA-ARS?s Scientific Manuscript database

    Immunogenic, pathogen-specific proteins have excellent potential for development of novel management modalities. Here, we describe an innovative application of proteomics called Microbial protein-Antigenome Determination (MAD) Technology for rapid identification of native microbial proteins that el...

  14. Microbial Protein-Antigenome Determination (MAD) Technology: A Proteomics-Based Strategy for Rapid Identification of Microbial Targets of Host Humoral Immune Responses

    USDA-ARS?s Scientific Manuscript database

    Immunogenic, pathogen-specific proteins have excellent potential for development of novel management modalities. Here, we describe an innovative application of proteomics called Microbial protein-Antigenome Determination (MAD) Technology for rapid identification of native microbial proteins that eli...

  15. Proteomic Workflows for Biomarker Identification Using Mass Spectrometry — Technical and Statistical Considerations during Initial Discovery

    PubMed Central

    Orton, Dennis J.; Doucette, Alan A.

    2013-01-01

    Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400

  16. LipidHome: a database of theoretical lipids optimized for high throughput mass spectrometry lipidomics.

    PubMed

    Foster, Joseph M; Moreno, Pablo; Fabregat, Antonio; Hermjakob, Henning; Steinbeck, Christoph; Apweiler, Rolf; Wakelam, Michael J O; Vizcaíno, Juan Antonio

    2013-01-01

    Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.

  17. A practical guide for the identification of membrane and plasma membrane proteins in human embryonic stem cells and human embryonal carcinoma cells.

    PubMed

    Dormeyer, Wilma; van Hoof, Dennis; Mummery, Christine L; Krijgsveld, Jeroen; Heck, Albert J R

    2008-10-01

    The identification of (plasma) membrane proteins in cells can provide valuable insights into the regulation of their biological processes. Pluripotent cells such as human embryonic stem cells and embryonal carcinoma cells are capable of unlimited self-renewal and share many of the biological mechanisms that regulate proliferation and differentiation. The comparison of their membrane proteomes will help unravel the biological principles of pluripotency, and the identification of biomarker proteins in their plasma membranes is considered a crucial step to fully exploit pluripotent cells for therapeutic purposes. For these tasks, membrane proteomics is the method of choice, but as indicated by the scarce identification of membrane and plasma membrane proteins in global proteomic surveys it is not an easy task. In this minireview, we first describe the general challenges of membrane proteomics. We then review current sample preparation steps and discuss protocols that we found particularly beneficial for the identification of large numbers of (plasma) membrane proteins in human tumour- and embryo-derived stem cells. Our optimized assembled protocol led to the identification of a large number of membrane proteins. However, as the composition of cells and membranes is highly variable we still recommend adapting the sample preparation protocol for each individual system.

  18. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  19. An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Chaochao; Monroe, Matthew E.; Xu, Zhe

    2015-12-26

    Comprehensive MS analysis of peptidome, the intracellular and intercellular products of protein degradation, has the potential to provide novel insights on endogenous proteolytic processing and their utility in disease diagnosis and prognosis. Along with the advances in MS instrumentation, a plethora of proteomics data analysis tools have been applied for direct use in peptidomics; however an evaluation of the currently available informatics pipelines for peptidomics data analysis has yet to be reported. In this study, we set off by evaluating the results of several popular MS/MS database search engines including MS-GF+, SEQUEST and MS-Align+ for peptidomics data analysis, followed bymore » identification and label-free quantification using the well-established accurate mass and time (AMT) tag and newly developed informed quantification (IQ) approaches, both based on direct LC-MS analysis. Our result demonstrated that MS-GF+ outperformed both SEQUEST and MS-Align+ in identifying peptidome peptides. Using a database established from the MS-GF+ peptide identifications, both the AMT tag and IQ approaches provided significantly deeper peptidome coverage and less missing value for each individual data set than the MS/MS methods, while achieving robust label-free quantification. Besides having an excellent correlation with the AMT tag quantification results, IQ also provided slightly higher peptidome coverage than AMT. Taken together, we propose an optimal informatics pipeline combining MS-GF+ for initial database searching with IQ (or AMT) for identification and label-free quantification for high-throughput, comprehensive and quantitative peptidomics analysis.« less

  20. MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

    PubMed Central

    Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

    2014-01-01

    Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

  1. A combined database related and de novo MS-identification of yeast mannose-1-phosphate guanyltransferase PSA1 interaction partners at different phases of batch cultivation

    NASA Astrophysics Data System (ADS)

    Parviainen, Ville; Joenväärä, Sakari; Peltoniemi, Hannu; Mattila, Pirkko; Renkonen, Risto

    2009-04-01

    Mass spectrometry-based proteomic research has become one of the main methods in protein-protein interaction research. Several high throughput studies have established an interaction landscape of exponentially growing Baker's yeast culture. However, many of the protein-protein interactions are likely to change in different environmental conditions. In order to examine the dynamic nature of the protein interactions we isolated the protein complexes of mannose-1-phosphate guanyltransferase PSA1 from Saccharomyces cerevisiae at four different time points during batch cultivation. We used the tandem affinity purification (TAP)-method to purify the complexes and subjected the tryptic peptides to LC-MS/MS. The resulting peak lists were analyzed with two different methods: the database related protein identification program X!Tandem and the de novo sequencing program Lutefisk. We observed significant changes in the interactome of PSA1 during the batch cultivation and identified altogether 74 proteins interacting with PSA1 of which only six were found to interact during all time points. All the other proteins showed a more dynamic nature of binding activity. In this study we also demonstrate the benefit of using both database related and de novo methods in the protein interaction research to enhance both the quality and the quantity of observations.

  2. Enhanced photosynthesis and redox energy production contribute to salinity tolerance in Dunaliella as revealed by homology-based proteomics.

    PubMed

    Liska, Adam J; Shevchenko, Andrej; Pick, Uri; Katz, Adriana

    2004-09-01

    Salinity is a major limiting factor for the proliferation of plants and inhibits central metabolic activities such as photosynthesis. The halotolerant green alga Dunaliella can adapt to hypersaline environments and is considered a model photosynthetic organism for salinity tolerance. To clarify the molecular basis for salinity tolerance, a proteomic approach has been applied for identification of salt-induced proteins in Dunaliella. Seventy-six salt-induced proteins were selected from two-dimensional gel separations of different subcellular fractions and analyzed by mass spectrometry (MS). Application of nanoelectrospray mass spectrometry, combined with sequence-similarity database-searching algorithms, MS BLAST and MultiTag, enabled identification of 80% of the salt-induced proteins. Salinity stress up-regulated key enzymes in the Calvin cycle, starch mobilization, and redox energy production; regulatory factors in protein biosynthesis and degradation; and a homolog of a bacterial Na(+)-redox transporters. The results indicate that Dunaliella responds to high salinity by enhancement of photosynthetic CO(2) assimilation and by diversion of carbon and energy resources for synthesis of glycerol, the osmotic element in Dunaliella. The ability of Dunaliella to enhance photosynthetic activity at high salinity is remarkable because, in most plants and cyanobacteria, salt stress inhibits photosynthesis. The results demonstrated the power of MS BLAST searches for the identification of proteins in organisms whose genomes are not known and paved the way for dissecting molecular mechanisms of salinity tolerance in algae and higher plants.

  3. Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses.

    PubMed

    Lingner, Thomas; Kataya, Amr R; Antonicelli, Gerardo E; Benichou, Aline; Nilssen, Kjersti; Chen, Xiong-Yan; Siemsen, Tanja; Morgenstern, Burkhard; Meinicke, Peter; Reumann, Sigrun

    2011-04-01

    In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.

  4. Label-Free Quantitative Proteomic Analysis of Harmless and Pathogenic Strains of Infectious Microalgae, Prototheca spp.

    PubMed Central

    Murugaiyan, Jayaseelan; Eravci, Murat; Weise, Christoph; Roesler, Uwe

    2016-01-01

    Microalgae of the genus Prototheca (P.) spp are associated with rare algal infections of invertebrates termed protothecosis. Among the seven generally accepted species, P. zopfii genotype 2 (GT2) is associated with a severe form of bovine mastitis while P. blaschkeae causes the mild and sub-clinical form of mastitis. The reason behind the infectious nature of P. zopfii GT2, while genotype 1 (GT1) remains non-infectious, is not known. Therefore, in the present study we investigated the protein expression level difference between the genotypes of P. zopfii and P. blaschkeae. Cells were cultured to the mid-exponential phase, harvested, and processed for LC-MS analysis. Peptide data was acquired on an LTQ Orbitrap Velos, raw spectra were quantitatively analyzed with MaxQuant software and matching with the reference database of Chlorella variabilis and Auxenochlorella protothecoides resulted in the identification of 226 proteins. Comparison of an environmental strain with infectious strains resulted in the identification of 51 differentially expressed proteins related to carbohydrate metabolism, energy production and protein translation. The expression level of Hsp70 proteins and their role in the infectious process is worth further investigation. All mass spectrometry data are available via ProteomeXchange with identifier PXD005305. PMID:28036087

  5. Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Omenn, Gilbert; States, David J.; Adamski, Marcin

    2005-08-13

    HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anticoagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics. med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasetsmore » had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.« less

  6. Deglycosylation systematically improves N-glycoprotein identification in liquid chromatography-tandem mass spectrometry proteomics for analysis of cell wall stress responses in Saccharomyces cerevisiae lacking Alg3p.

    PubMed

    Bailey, Ulla-Maja; Schulz, Benjamin L

    2013-04-01

    Post-translational modification of proteins with glycosylation is of key importance in many biological systems in eukaryotes, influencing fundamental biological processes and regulating protein function. Changes in glycosylation are therefore of interest in understanding these processes and are also useful as clinical biomarkers of disease. The presence of glycosylation can also inhibit protease digestion and lower the quality and confidence of protein identification by mass spectrometry. While deglycosylation can improve the efficiency of subsequent protease digest and increase protein coverage, this step is often excluded from proteomic workflows. Here, we performed a systematic analysis that showed that deglycosylation with peptide-N-glycosidase F (PNGase F) prior to protease digestion with AspN or trypsin improved the quality of identification of the yeast cell wall proteome. The improvement in the confidence of identification of glycoproteins following PNGase F deglycosylation correlated with a higher density of glycosylation sites. Optimal identification across the proteome was achieved with PNGase F deglycosylation and complementary proteolysis with either AspN or trypsin. We used this combination of deglycosylation and complementary protease digest to identify changes in the yeast cell wall proteome caused by lack of the Alg3p protein, a key component of the biosynthetic pathway of protein N-glycosylation. The cell wall of yeast lacking Alg3p showed specifically increased levels of Cis3p, a protein important for cell wall integrity. Our results showed that deglycosylation prior to protease digestion improved the quality of proteomic analyses even if protein glycosylation is not of direct relevance to the study at hand. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. The Mendeleev-Meyer force project.

    PubMed

    Santos, Sergio; Lai, Chia-Yun; Amadei, Carlo A; Gadelrab, Karim R; Tang, Tzu-Chieh; Verdaguer, Albert; Barcons, Victor; Font, Josep; Colchero, Jaime; Chiesa, Matteo

    2016-10-14

    Here we present the Mendeleev-Meyer Force Project which aims at tabulating all materials and substances in a fashion similar to the periodic table. The goal is to group and tabulate substances using nanoscale force footprints rather than atomic number or electronic configuration as in the periodic table. The process is divided into: (1) acquiring nanoscale force data from materials, (2) parameterizing the raw data into standardized input features to generate a library, (3) feeding the standardized library into an algorithm to generate, enhance or exploit a model to identify a material or property. We propose producing databases mimicking the Materials Genome Initiative, the Medical Literature Analysis and Retrieval System Online (MEDLARS) or the PRoteomics IDEntifications database (PRIDE) and making these searchable online via search engines mimicking Pubmed or the PRIDE web interface. A prototype exploiting deep learning algorithms, i.e. multilayer neural networks, is presented.

  8. Exploring the human seminal plasma proteome: an unexplored gold mine of biomarker for male infertility and male reproduction disorder.

    PubMed

    Gilany, Kambiz; Minai-Tehrani, Arash; Savadi-Shiraz, Elham; Rezadoost, Hassan; Lakpour, Niknam

    2015-01-01

    The human seminal fluid is a complex body fluid. It is not known how many proteins are expressed in the seminal plasma; however in analog with the blood it is possible up to 10,000 proteins are expressed in the seminal plasma. The human seminal fluid is a rich source of potential biomarkers for male infertility and reproduction disorder. In this review, the ongoing list of proteins identified from the human seminal fluid was collected. To date, 4188 redundant proteins of the seminal fluid are identified using different proteomics technology, including 2-DE, SDS-PAGE-LC-MS/MS, MudPIT. However, this was reduced to a database of 2168 non-redundant protein using UniProtKB/Swiss-Prot reviewed database. The core concept of proteome were analyzed including pI, MW, Amino Acids, Chromosome and PTM distribution in the human seminal plasma proteome. Additionally, the biological process, molecular function and KEGG pathway were investigated using DAVID software. Finally, the biomarker identified in different male reproductive system disorder was investigated using proteomics platforms so far. In this study, an attempt was made to update the human seminal plasma proteome database. Our finding showed that human seminal plasma studies used to date seem to have converged on a set of proteins that are repeatedly identified in many studies and that represent only a small fraction of the entire human seminal plasma proteome.

  9. Proteomics for understanding miRNA biology

    PubMed Central

    Huang, Tai-Chung; Pinto, Sneha M.; Pandey, Akhilesh

    2013-01-01

    MicroRNAs (miRNAs) are small noncoding RNAs that play important roles in posttranscriptional regulation of gene expression. Mature miRNAs associate with the RNA interference silencing complex to repress mRNA translation and/or degrade mRNA transcripts. Mass spectrometry-based proteomics has enabled identification of several core components of the canonical miRNA processing pathway and their posttranslational modifications which are pivotal in miRNA regulatory mechanisms. The use of quantitative proteomic strategies has also emerged as a key technique for experimental identification of miRNA targets by allowing direct determination of proteins whose levels are altered because of translational suppression. This review focuses on the role of proteomics and labeling strategies to understand miRNA biology. PMID:23125164

  10. MASH Suite Pro: A Comprehensive Software Tool for Top-Down Proteomics*

    PubMed Central

    Cai, Wenxuan; Guner, Huseyin; Gregorich, Zachery R.; Chen, Albert J.; Ayaz-Guner, Serife; Peng, Ying; Valeja, Santosh G.; Liu, Xiaowen; Ge, Ying

    2016-01-01

    Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics. PMID:26598644

  11. A gel-free proteomic-based method for the characterization of Bordetella pertussis clinical isolates

    PubMed Central

    Williamson, Yulanda M.; Moura, Hercules; Simmons, Kaneatra; Whitmon, Jennifer; Melnick, Nikkol; Rees, Jon; Woolfitt, Adrian; Schieltz, David M.; Tondella, Maria L.; Ades, Edwin; Sampson, Jacquelyn; Carlone, George; Barr, John R.

    2017-01-01

    Bordetella pertussis (Bp) is the etiologic agent of pertussis or whooping cough, a highly contagious respiratory disease occurring primarily in infants and young children. Although vaccine preventable, pertussis cases have increased over the years leading researchers to re-evaluate vaccine control strategies. Since bacterial outer membrane proteins, comprising the surfaceome, often play roles in pathogenesis and antibody-mediated immunity, three recent Bp circulating isolates were examined using proteomics to identify any potential changes in surface protein expression. Fractions enriched for outer membrane proteins were digested with trypsin and the peptides analyzed by nano liquid chromatography-electrospray ionization-mass spectrometry (nLC-ESI-MS), followed by database analysis to elucidate the surfaceomes of our three Bp isolates. Furthermore, a less labor intensive non-gel based antibody affinity capture technology in conjunction with MS was employed to assess each Bp strains' immunogenic outer membrane proteins. This novel technique is generally applicable allowing for the identification of immunogenic surface expressed proteins on pertussis and other pathogenic bacteria. PMID:22537821

  12. Charting novel allergens from date palm pollen (Phoenix sylvestris) using homology driven proteomics.

    PubMed

    Saha, Bodhisattwa; Bhattacharya, Swati Gupta

    2017-08-08

    Pollen grains from Phoenix sylvestris (date palm), a commonly cultivated tree in India has been found to cause severe allergic diseases in an increasing percentage of hypersensitive individuals. To unearth its allergenic components, pollen protein were profiled by two-dimensional gel electrophoresis followed by immunoblotting with date palm pollen sensitive patient sera. Allergens were identified by MALDI-TOF/TOF employing a layered proteomic approach combining conventional database dependent search and manual de novo sequencing followed by homology-based search as Phoenix sylvestris is unsequenced. Derivatization of tryptic peptides by acetylation has been demonstrated to differentiate the 'b' from the 'y' ions facilitating efficient de novo sequencing. Ten allergenic proteins were identified, out of which six showed homology with known allergens while others were reported for the first time. Amongst these, isoflavone reductase, beta-conglycinin, S-adenosyl methionine synthase, 1, 4 glucan synthase and beta-galactosidase were commonly reported as allergens from coconut pollen and presumably responsible for cross-reactivity. One of the allergens had IgE binding epitope recognized by its glycan moiety. The allergenic potency of date palm pollen has been demonstrated using in vitro tests. The identified allergens can be used to develop vaccines for immunotherapy against date palm pollen allergy. Identification of allergenic proteins from sources harboring them is essential in developing therapeutic interventions. This is the first comprehensive study on the identification of allergens from Phoenix sylvestris (date palm) pollen, one of the major aeroallergens in India using a proteomic approach. Proteomic methods are being increasingly used to identify allergens. However, since many of these proteins arise from species which are un-sequenced, it becomes difficult to interpret those using conventional proteomics. Date palm being an unsequenced species, the IgE-reactive proteins have been identified using a stratified proteomic workflow incorporating manual de novo sequencing and homology-based proteomics. This study also gives an insight into the presence of glycan nature of the IgE binding epitopes. Five proteins have been found to be common with coconut pollen allergens and presumably responsible for cross-reactivity. These can be used in diagnostics to differentiate patient cohorts allergic to both coconut and date palm pollen from true date palm pollen allergic subjects. This would also determine better specific immunotherapy regimes between the two cohorts. The allergens identified herein have potential towards vaccine development in date palm pollen allergy as well as in enriching the existing catalogue of allergenic proteins. Copyright © 2017. Published by Elsevier B.V.

  13. Shotgun proteomics of the barley seed proteome

    USDA-ARS?s Scientific Manuscript database

    Barley seed proteins are of prime importance to the brewing industry, human and animal nutrition and in plant breeding for cultivar identification. To obtain comprehensive proteomic data from barley seeds, acetone precipitated proteins were in-solution digested and the resulting peptides were analyz...

  14. Proteome reference map and regulation network of neonatal rat cardiomyocyte

    PubMed Central

    Li, Zi-jian; Liu, Ning; Han, Qi-de; Zhang, You-yi

    2011-01-01

    Aim: To study and establish a proteome reference map and regulation network of neonatal rat cardiomyocyte. Methods: Cultured cardiomyocytes of neonatal rats were used. All proteins expressed in the cardiomyocytes were separated and identified by two-dimensional polyacrylamide gel electrophoresis (2-DE) and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS). Biological networks and pathways of the neonatal rat cardiomyocytes were analyzed using the Ingenuity Pathway Analysis (IPA) program (www.ingenuity.com). A 2-DE database was made accessible on-line by Make2ddb package on a web server. Results: More than 1000 proteins were separated on 2D gels, and 148 proteins were identified. The identified proteins were used for the construction of an extensible markup language-based database. Biological networks and pathways were constructed to analyze the functions associate with cardiomyocyte proteins in the database. The 2-DE database of rat cardiomyocyte proteins can be accessed at http://2d.bjmu.edu.cn. Conclusion: A proteome reference map and regulation network of the neonatal rat cardiomyocytes have been established, which may serve as an international platform for storage, analysis and visualization of cardiomyocyte proteomic data. PMID:21841810

  15. Global Cell Proteome Profiling, Phospho-signaling and Quantitative 
Proteomics for Identification of New Biomarkers in Acute Myeloid 
Leukemia Patients

    PubMed Central

    Aasebø, Elise; Forthun, Rakel B.; Berven, Frode; Selheim, Frode; Hernandez-Valladares, Maria

    2016-01-01

    The identification of protein biomarkers for acute myeloid leukemia (AML) that could find applications in AML diagnosis and prognosis, treatment and the selection for bone marrow transplant requires substantial comparative analyses of the proteomes from AML patients. In the past years, several studies have suggested some biomarkers for AML diagnosis or AML classification using methods for sample preparation with low proteome coverage and low resolution mass spectrometers. However, most of the studies did not follow up, confirm or validate their candidates with more patient samples. Current proteomics methods, new high resolution and fast mass spectrometers allow the identification and quantification of several thousands of proteins obtained from few tens of μg of AML cell lysate. Enrichment methods for posttranslational modifications (PTM), such as phosphorylation, can isolate several thousands of site-specific phosphorylated peptides from AML patient samples, which subsequently can be quantified with high confidence in new mass spectrometers. While recent reports aiming to propose proteomic or phosphoproteomic biomarkers on the studied AML patient samples have taken advantage of the technological progress, the access to large cohorts of AML patients to sample from and the availability of appropriate control samples still remain challenging. PMID:26306748

  16. Quantitative body fluid proteomics in medicine - A focus on minimal invasiveness.

    PubMed

    Csősz, Éva; Kalló, Gergő; Márkus, Bernadett; Deák, Eszter; Csutak, Adrienne; Tőzsér, József

    2017-02-05

    Identification of new biomarkers specific for various pathological conditions is an important field in medical sciences. Body fluids have emerging potential in biomarker studies especially those which are continuously available and can be collected by non-invasive means. Changes in the protein composition of body fluids such as tears, saliva, sweat, etc. may provide information on both local and systemic conditions of medical relevance. In this review, our aim is to discuss the quantitative proteomics techniques used in biomarker studies, and to present advances in quantitative body fluid proteomics of non-invasively collectable body fluids with relevance to biomarker identification. The advantages and limitations of the widely used quantitative proteomics techniques are also presented. Based on the reviewed literature, we suggest an ideal pipeline for body fluid analyses aiming at biomarkers discoveries: starting from identification of biomarker candidates by shotgun quantitative proteomics or protein arrays, through verification of potential biomarkers by targeted mass spectrometry, to the antibody-based validation of biomarkers. The importance of body fluids as a rich source of biomarkers is discussed. Quantitative proteomics is a challenging part of proteomics applications. The body fluids collected by non-invasive means have high relevance in medicine; they are good sources for biomarkers used in establishing the diagnosis, follow up of disease progression and predicting high risk groups. The review presents the most widely used quantitative proteomics techniques in body fluid analysis and lists the potential biomarkers identified in tears, saliva, sweat, nasal mucus and urine for local and systemic diseases. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Technical aspects of gel-based proteomics designed for elucidating an aryl hydrocarbon receptor complex.

    PubMed

    Wada, Yoshinao; Nakano, Norihiko

    2004-01-01

    The identification of proteins by mass spectrometry has revolutionalized the basic method of identifying proteins constituting an intracellular unit or network for certain biological functions. The gel-based strategy following immunoprecipitation was applied to elucidating proteins associated with the aryl hydrocarbon receptor (AhR). Two hundred femtomoles of AhR was recovered from approximately 2 x 10(7) HepG2 cells by immunoprecipitation and was sufficient for identification by peptide mass fingerprinting. Possible candidates for the AhR-associated proteins were also identified. Improvements of the current strategy to increase the overall sensitivity tenfold are required to clarify the AhR complex in full detail. For example, a combination of trypsin and Achromobacter protease I for in-gel digestion allows the number of missed cleavage sites to be set at zero for database searching, thereby reducing random matches and facilitating identification. There is also room for improvement in each step of sample preparation prior to mass spectrometry.

  18. Network-Based Approaches in Drug Discovery and Early Development

    PubMed Central

    Harrold, JM; Ramanathan, M; Mager, DE

    2015-01-01

    Identification of novel targets is a critical first step in the drug discovery and development process. Most diseases such as cancer, metabolic disorders, and neurological disorders are complex, and their pathogenesis involves multiple genetic and environmental factors. Finding a viable drug target–drug combination with high potential for yielding clinical success within the efficacy–toxicity spectrum is extremely challenging. Many examples are now available in which network-based approaches show potential for the identification of novel targets and for the repositioning of established targets. The objective of this article is to highlight network approaches for identifying novel targets with greater chances of gaining approved drugs with maximal efficacy and minimal side effects. Further enhancement of these approaches may emerge from effectively integrating computational systems biology with pharmacodynamic systems analysis. Coupling genomics, proteomics, and metabolomics databases with systems pharmacology modeling may aid in the development of disease-specific networks that can be further used to build confidence in target identification. PMID:24025802

  19. Assembling proteomics data as a prerequisite for the analysis of large scale experiments

    PubMed Central

    Schmidt, Frank; Schmid, Monika; Thiede, Bernd; Pleißner, Klaus-Peter; Böhme, Martina; Jungblut, Peter R

    2009-01-01

    Background Despite the complete determination of the genome sequence of a huge number of bacteria, their proteomes remain relatively poorly defined. Beside new methods to increase the number of identified proteins new database applications are necessary to store and present results of large- scale proteomics experiments. Results In the present study, a database concept has been developed to address these issues and to offer complete information via a web interface. In our concept, the Oracle based data repository system SQL-LIMS plays the central role in the proteomics workflow and was applied to the proteomes of Mycobacterium tuberculosis, Helicobacter pylori, Salmonella typhimurium and protein complexes such as 20S proteasome. Technical operations of our proteomics labs were used as the standard for SQL-LIMS template creation. By means of a Java based data parser, post-processed data of different approaches, such as LC/ESI-MS, MALDI-MS and 2-D gel electrophoresis (2-DE), were stored in SQL-LIMS. A minimum set of the proteomics data were transferred in our public 2D-PAGE database using a Java based interface (Data Transfer Tool) with the requirements of the PEDRo standardization. Furthermore, the stored proteomics data were extractable out of SQL-LIMS via XML. Conclusion The Oracle based data repository system SQL-LIMS played the central role in the proteomics workflow concept. Technical operations of our proteomics labs were used as standards for SQL-LIMS templates. Using a Java based parser, post-processed data of different approaches such as LC/ESI-MS, MALDI-MS and 1-DE and 2-DE were stored in SQL-LIMS. Thus, unique data formats of different instruments were unified and stored in SQL-LIMS tables. Moreover, a unique submission identifier allowed fast access to all experimental data. This was the main advantage compared to multi software solutions, especially if personnel fluctuations are high. Moreover, large scale and high-throughput experiments must be managed in a comprehensive repository system such as SQL-LIMS, to query results in a systematic manner. On the other hand, these database systems are expensive and require at least one full time administrator and specialized lab manager. Moreover, the high technical dynamics in proteomics may cause problems to adjust new data formats. To summarize, SQL-LIMS met the requirements of proteomics data handling especially in skilled processes such as gel-electrophoresis or mass spectrometry and fulfilled the PSI standardization criteria. The data transfer into a public domain via DTT facilitated validation of proteomics data. Additionally, evaluation of mass spectra by post-processing using MS-Screener improved the reliability of mass analysis and prevented storage of data junk. PMID:19166578

  20. AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins.

    PubMed

    Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert

    2010-06-01

    Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further validated as the compartmentation of well known pathways (for instance, photosynthesis and amino acid, fatty acid, or glycerolipid biosynthesis) within chloroplasts could be dissected. It also allowed revisiting the compartmentation of the chloroplast metabolism and functions.

  1. reSpect: Software for Identification of High and Low Abundance Ion Species in Chimeric Tandem Mass Spectra

    PubMed Central

    Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.

    2016-01-01

    Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contributes to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), that enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the following iterations. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. PMID:26419769

  2. reSpect: software for identification of high and low abundance ion species in chimeric tandem mass spectra.

    PubMed

    Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W; Moritz, Robert L

    2015-11-01

    Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. Graphical Abstract ᅟ.

  3. reSpect: Software for Identification of High and Low Abundance Ion Species in Chimeric Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.

    2015-11-01

    Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website.

  4. Characterization of the Mouse Pancreatic Islet Proteome and Comparative Analysis with Other Mouse Tissues

    PubMed Central

    Petyuk, Vladislav A.; Qian, Wei-Jun; Hinault, Charlotte; Gritsenko, Marina A.; Singhal, Mudita; Monroe, Matthew E.; Camp, David G.; Kulkarni, Rohit N.; Smith, Richard D.

    2009-01-01

    The pancreatic islets of Langerhans, and especially the insulin-producing beta cells, play a central role in the maintenance of glucose homeostasis. Alterations in the expression of multiple proteins in the islets that contribute to the maintenance of islet function are likely to underlie the pathogenesis of type 2 diabetes. To identify proteins that constitute the islet proteome, we provide the first comprehensive proteomic characterization of pancreatic islets for mouse, the most commonly used animal model in diabetes research. Using strong cation exchange fractionation coupled with reversed phase LC-MS/MS we report the confident identification of 17,350 different tryptic peptides covering 2,612 proteins having at least two unique peptides per protein. The dataset also identified ~60 post-translationally modified peptides including oxidative modifications and phosphorylation. While many of the identified phosphorylation sites corroborate those previously known, the oxidative modifications observed on cysteinyl residues reveal potentially novel information suggesting a role for oxidative stress in islet function. Comparative analysis with 15 available proteomic datasets from other mouse tissues and cells revealed a set of 133 proteins predominantly expressed in pancreatic islets. This unique set of proteins, in addition to those with known functions such as peptide hormones secreted from the islets, contains several proteins with as yet unknown functions. The mouse islet protein and peptide database accessible at http://ncrr.pnl.gov, provides an important reference resource for the research community to facilitate research in the diabetes and metabolism fields. PMID:18570455

  5. Identification of lactoferricin B intracellular targets using an Escherichia coli proteome chip.

    PubMed

    Tu, Yu-Hsuan; Ho, Yu-Hsuan; Chuang, Ying-Chih; Chen, Po-Chung; Chen, Chien-Sheng

    2011-01-01

    Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach.

  6. Identification of Lactoferricin B Intracellular Targets Using an Escherichia coli Proteome Chip

    PubMed Central

    Chen, Po-Chung; Chen, Chien-Sheng

    2011-01-01

    Lactoferricin B (LfcinB) is a well-known antimicrobial peptide. Several studies have indicated that it can inhibit bacteria by affecting intracellular activities, but the intracellular targets of this antimicrobial peptide have not been identified. Therefore, we used E. coli proteome chips to identify the intracellular target proteins of LfcinB in a high-throughput manner. We probed LfcinB with E. coli proteome chips and further conducted normalization and Gene Ontology (GO) analyses. The results of the GO analyses showed that the identified proteins were associated with metabolic processes. Moreover, we validated the interactions between LfcinB and chip assay-identified proteins with fluorescence polarization (FP) assays. Sixteen proteins were identified, and an E. coli interaction database (EcID) analysis revealed that the majority of the proteins that interact with these 16 proteins affected the tricarboxylic acid (TCA) cycle. Knockout assays were conducted to further validate the FP assay results. These results showed that phosphoenolpyruvate carboxylase was a target of LfcinB, indicating that one of its mechanisms of action may be associated with pyruvate metabolism. Thus, we used pyruvate assays to conduct an in vivo validation of the relationship between LfcinB and pyruvate level in E. coli. These results showed that E. coli exposed to LfcinB had abnormal pyruvate amounts, indicating that LfcinB caused an accumulation of pyruvate. In conclusion, this study successfully revealed the intracellular targets of LfcinB using an E. coli proteome chip approach. PMID:22164243

  7. Profiling modifications for glioblastoma proteome using ultra-tolerant database search: Are the peptide mass shifts biologically relevant or chemically induced?

    PubMed

    Tarasova, Irina A; Chumakov, Peter M; Moshkovskii, Sergei A; Gorshkov, Mikhail V

    2018-05-17

    Peptide mass shifts were profiled using ultra-tolerant database search strategy for shotgun proteomics data sets of human glioblastoma cell lines demonstrating strong response to the type I interferon (IFNα-2b) treatment. The main objective of this profiling was revealing the cell response to IFN treatment at the level of protein modifications. To achieve this objective, statistically significant changes in peptide mass shift profiles between IFN treated and untreated glioblastoma samples were analyzed. Detailed analysis of MS/MS spectra allowed further interpretation of the observed mass shifts and differentiation between post-translational and artifact modifications. Malignant cells typically acquire increased sensitivity to viruses due to the deregulated antiviral mechanisms. Therefore, a viral therapy is considered as one of the promising approaches to treat cancer. However, recent studies have demonstrated that malignant cells can preserve intact antiviral mechanisms, e.g. interferon signaling, and develop resistance to virus infection in response to interferon treatment. Post translational modifications, e.g. tyrosine phosphorylation, are the interferon signaling drivers. Thus, comprehensive characterization of modifications is crucially important, yet, most challenging problem in cancer proteomics. Here, we report on the application of the recently introduced ultra-tolerant search strategy for profiling peptide modifications in the human glioblastoma cell lines demonstrating strong response to the type I interferon (IFNα-2b) treatment. The specific aim of the study was identification of statistically significant changes in peptide mass shift profiles between IFN treated and untreated glioblastoma samples, as well as determination of whether these shifts represent the biologically relevant modification. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Proteomic analysis of human aqueous humor using multidimensional protein identification technology

    PubMed Central

    Richardson, Matthew R.; Price, Marianne O.; Price, Francis W.; Pardo, Jennifer C.; Grandin, Juan C.; You, Jinsam; Wang, Mu

    2009-01-01

    Aqueous humor (AH) supports avascular tissues in the anterior segment of the eye, maintains intraocular pressure, and potentially influences the pathogenesis of ocular diseases. Nevertheless, the AH proteome is still poorly defined despite several previous efforts, which were hindered by interfering high abundance proteins, inadequate animal models, and limited proteomic technologies. To facilitate future investigations into AH function, the AH proteome was extensively characterized using an advanced proteomic approach. Samples from patients undergoing cataract surgery were pooled and depleted of interfering abundant proteins and thereby divided into two fractions: albumin-bound and albumin-depleted. Multidimensional Protein Identification Technology (MudPIT) was utilized for each fraction; this incorporates strong cation exchange chromatography to reduce sample complexity before reversed-phase liquid chromatography and tandem mass spectrometric analysis. Twelve proteins had multi-peptide, high confidence identifications in the albumin-bound fraction and 50 proteins had multi-peptide, high confidence identifications in the albumin-depleted fraction. Gene ontological analyses were performed to determine which cellular components and functions were enriched. Many proteins were previously identified in the AH and for several their potential role in the AH has been investigated; however, the majority of identified proteins were novel and only speculative roles can be suggested. The AH was abundant in anti-oxidant and immunoregulatory proteins as well as anti-angiogenic proteins, which may be involved in maintaining the avascular tissues. This is the first known report to extensively characterize and describe the human AH proteome and lays the foundation for future work regarding its function in homeostatic and pathologic states. PMID:20019884

  9. New Strategies and Challenges in Lung Proteomics and Metabolomics. An Official American Thoracic Society Workshop Report.

    PubMed

    Bowler, Russell P; Wendt, Chris H; Fessler, Michael B; Foster, Matthew W; Kelly, Rachel S; Lasky-Su, Jessica; Rogers, Angela J; Stringer, Kathleen A; Winston, Brent W

    2017-12-01

    This document presents the proceedings from the workshop entitled, "New Strategies and Challenges in Lung Proteomics and Metabolomics" held February 4th-5th, 2016, in Denver, Colorado. It was sponsored by the National Heart Lung Blood Institute, the American Thoracic Society, the Colorado Biological Mass Spectrometry Society, and National Jewish Health. The goal of this workshop was to convene, for the first time, relevant experts in lung proteomics and metabolomics to discuss and overcome specific challenges in these fields that are unique to the lung. The main objectives of this workshop were to identify, review, and/or understand: (1) emerging technologies in metabolomics and proteomics as applied to the study of the lung; (2) the unique composition and challenges of lung-specific biological specimens for metabolomic and proteomic analysis; (3) the diverse informatics approaches and databases unique to metabolomics and proteomics, with special emphasis on the lung; (4) integrative platforms across genetic and genomic databases that can be applied to lung-related metabolomic and proteomic studies; and (5) the clinical applications of proteomics and metabolomics. The major findings and conclusions of this workshop are summarized at the end of the report, and outline the progress and challenges that face these rapidly advancing fields.

  10. Proteomics characterization of different bran proteins between aromatic and nonaromatic rice (Oryza sativa L. ssp. indica).

    PubMed

    Trisiriroj, Arunee; Jeyachok, Narumon; Chen, Shui-Tein

    2004-07-01

    Proteomic approach is applied for the analysis of seed brans of 14 rice varieties (Oryza sativa L. ssp. indica) which can classify to five aromatic rice and nine nonaromatic rice. The two-dimensional electrophoresis (2-DE) protein patterns for 14 rice varieties were similar within pH ranges of 3-10 and 4-7. To characterize aromatic group-specific proteins, we compared 2-D gels of aromatic rice to nonaromatic rice using PDQUEST image analysis. Four out of six differential spots were identified as hypothetical proteins, but one (SSP 7003) was identified by matrix assisted laser desoption/ionization-quardrupole-time of fight (MALDI-Q-TOF) as prolamin with three matching peptides based on NCBI database. Prolamin is a class of storage proteins with three different polypeptides of 10, 13, and 16 kDa. Spot SSP7003 was identified as a 13 kDa polypeptide of prolamin by combination of mass spectroscopy and N-terminal sequence analyses. In contrast, one sulfur-rich 16 kDa polypeptide of prolamin was found in extremely high intensity in brans of deep-water rice compared to nondeep-water rice. Our results suggest that proteomics is a powerful step to open the way for the identification of rice varieties.

  11. Proteome-wide search for functional motifs altered in tumors: Prediction of nuclear export signals inactivated by cancer-related mutations

    PubMed Central

    Prieto, Gorka; Fullaondo, Asier; Rodríguez, Jose A.

    2016-01-01

    Large-scale sequencing projects are uncovering a growing number of missense mutations in human tumors. Understanding the phenotypic consequences of these alterations represents a formidable challenge. In silico prediction of functionally relevant amino acid motifs disrupted by cancer mutations could provide insight into the potential impact of a mutation, and guide functional tests. We have previously described Wregex, a tool for the identification of potential functional motifs, such as nuclear export signals (NESs), in proteins. Here, we present an improved version that allows motif prediction to be combined with data from large repositories, such as the Catalogue of Somatic Mutations in Cancer (COSMIC), and to be applied to a whole proteome scale. As an example, we have searched the human proteome for candidate NES motifs that could be altered by cancer-related mutations included in the COSMIC database. A subset of the candidate NESs identified was experimentally tested using an in vivo nuclear export assay. A significant proportion of the selected motifs exhibited nuclear export activity, which was abrogated by the COSMIC mutations. In addition, our search identified a cancer mutation that inactivates the NES of the human deubiquitinase USP21, and leads to the aberrant accumulation of this protein in the nucleus. PMID:27174732

  12. Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis.

    PubMed

    Kremer, Lukas P M; Leufken, Johannes; Oyunchimeg, Purevdulam; Schulze, Stefan; Fufezan, Christian

    2016-03-04

    Proteomics data integration has become a broad field with a variety of programs offering innovative algorithms to analyze increasing amounts of data. Unfortunately, this software diversity leads to many problems as soon as the data is analyzed using more than one algorithm for the same task. Although it was shown that the combination of multiple peptide identification algorithms yields more robust results, it is only recently that unified approaches are emerging; however, workflows that, for example, aim to optimize search parameters or that employ cascaded style searches can only be made accessible if data analysis becomes not only unified but also and most importantly scriptable. Here we introduce Ursgal, a Python interface to many commonly used bottom-up proteomics tools and to additional auxiliary programs. Complex workflows can thus be composed using the Python scripting language using a few lines of code. Ursgal is easily extensible, and we have made several database search engines (X!Tandem, OMSSA, MS-GF+, Myrimatch, MS Amanda), statistical postprocessing algorithms (qvality, Percolator), and one algorithm that combines statistically postprocessed outputs from multiple search engines ("combined FDR") accessible as an interface in Python. Furthermore, we have implemented a new algorithm ("combined PEP") that combines multiple search engines employing elements of "combined FDR", PeptideShaker, and Bayes' theorem.

  13. A Catalog of Proteins Expressed in the AG Secreted Fluid during the Mature Phase of the Chinese Mitten Crabs (Eriocheir sinensis)

    PubMed Central

    He, Lin; Li, Qing; Liu, Lihua; Wang, Yuanli; Xie, Jing; Yang, Hongdan; Wang, Qun

    2015-01-01

    The accessory gland (AG) is an important component of the male reproductive system of arthropods, its secretions enhance fertility, some AG proteins bind to the spermatozoa and affect its function and properties. Here we report the first comprehensive catalog of the AG secreted fluid during the mature phase of the Chinese mitten crab (Eriocheir sinensis). AG proteins were separated by one-dimensional gel electrophoresis and analyzed by reverse phase high-performance liquid chromatography coupled with tandem mass spectrometry (HPLC-MS/MS). Altogether, the mass spectra of 1173 peptides were detected (1067 without decoy and contaminants) which allowed for the identification of 486 different proteins annotated upon the NCBI database (http://www.ncbi.nlm.nih.gov/) and our transcritptome dataset. The mass spectrometry proteomics data have been deposited at the ProteomeXchange with identifier PXD000700. An extensive description of the AG proteome will help provide the basis for a better understanding of a number of reproductive mechanisms, including potentially spermatophore breakdown, dynamic functional and morphological changes in sperm cells and sperm acrosin enzyme vitality. Thus, the comprehensive catalog of proteins presented here can serve as a valuable reference for future studies of sperm maturation and regulatory mechanisms involved in crustacean reproduction. PMID:26305468

  14. Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components.

    PubMed

    Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

    2016-12-09

    Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis . The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms.

  15. Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components

    PubMed Central

    Santibáñez-López, Carlos E.; Cid-Uribe, Jimena I.; Batista, Cesar V. F.; Ortiz, Ernesto; Possani, Lourival D.

    2016-01-01

    Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis. The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms. PMID:27941686

  16. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

    PubMed Central

    Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan

    2013-01-01

    Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388

  17. Characterization and comparison of proteomes of albino sea cucumber Apostichopus japonicus (Selenka) by iTRAQ analysis.

    PubMed

    Xia, Chang-Ge; Zhang, Dijun; Ma, Chengnv; Zhou, Jun; He, Shan; Su, Xiu-Rong

    2016-04-01

    Sea cucumber is a commercially important marine organism in China. Of the different colored varieties sold in China, albino sea cucumber has the greatest appeal among consumers. Identification of factors contributing to albinism in sea cucumber is therefore likely to provide a scientific basis for improving the cultivability of these strains. In this study, two-dimensional liquid chromatography-tandem mass spectrometry coupled with isobaric tags for relative and absolute quantification labeling was used for the first time to quantitatively define the proteome of sea cucumbers and reveal proteomic characteristics unique to albino sea cucumbers. A total of 549 proteins were identified and quantified in albino sea cucumber and the functional annotations of 485 proteins have been exhibited based on COG database. Compared with green sea cucumber, 12 proteins were identified as differentially expressed in the intestine and 16 proteins in the body wall of albino sea cucumber. Among them, 5 proteins were up-regulated in the intestine and 8 proteins were down-regulated in body wall. Gene ontology annotations of these differentially expressed proteins consisted mostly of 'biological process'. The large number of differentially expressed proteins identified here should be highly useful in further elucidating the mechanisms underlying albinism in sea cucumber. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. High throughput profile-profile based fold recognition for the entire human proteome.

    PubMed

    McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2006-06-07

    In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

  19. Callitriche cophocarpa (water starwort) proteome under chromate stress: evidence for induction of a quinone reductase.

    PubMed

    Kaszycki, Paweł; Dubicka-Lisowska, Aleksandra; Augustynowicz, Joanna; Piwowarczyk, Barbara; Wesołowski, Wojciech

    2018-03-01

    Chromate-induced physiological stress in a water-submerged macrophyte Callitriche cophocarpa Sendtn. (water starwort) was tested at the proteomic level. The oxidative stress status of the plant treated with 1 mM Cr(VI) for 3 days revealed stimulation of peroxidases whereas catalase and superoxide dismutase activities were similar to the control levels. Employing two-dimensional electrophoresis, comparative proteomics enabled to detect five differentiating proteins subjected to identification with mass spectrometry followed by an NCBI database search. Cr(VI) incubation led to induction of light harvesting chlorophyll a/b binding protein with a concomitant decrease of accumulation of ribulose bisphosphate carboxylase (RuBisCO). The main finding was, however, the identification of an NAD(P)H-dependent dehydrogenase FQR1, detectable only in Cr(VI)-treated plants. The FQR1 flavoenzyme is known to be responsive to oxidative stress and to act as a detoxification protein by protecting the cells against oxidative damage. It exhibits the in vitro quinone reductase activity and is capable of catalyzing two-electron transfer from NAD(P)H to several substrates, presumably including Cr(VI). The enhanced accumulation of FQR1 was chromate-specific since other stressful conditions, such as salt, temperature, and oxidative stresses, all failed to induce the protein. Zymographic analysis of chromate-treated Callitriche shoots showed a novel enzymatic protein band whose activity was attributed to the newly identified enzyme. We suggest that Cr(VI) phytoremediation with C. cophocarpa can be promoted by chromate reductase activity produced by the induced quinone oxidoreductase which might take part in Cr(VI) → Cr(III) bioreduction process and thus enable the plant to cope with the chromate-generated oxidative stress.

  20. Seeds in Chernobyl: the database on proteome response on radioactive environment

    PubMed Central

    Klubicová, Katarína; Vesel, Martin; Rashydov, Namik M.; Hajduch, Martin

    2012-01-01

    Two serious nuclear accidents during the last quarter century (Chernobyl, 1986 and Fukushima, 2011) contaminated large agricultural areas with radioactivity. The database “Seeds in Chernobyl” (http://www.chernobylproteomics.sav.sk) contains the information about the abundances of hundreds of proteins from on-going investigation of mature and developing seed harvested from plants grown in radioactive Chernobyl area. This database provides a useful source of information concerning the response of the seed proteome to permanently increased level of ionizing radiation in a user-friendly format. PMID:23087698

  1. Mass Spectrometry Data Collection in Parallel at Multiple Core Facilities Operating TripleTOF 5600 and Orbitrap Elite/Velos Pro/Q Exactive Mass Spectrometers

    PubMed Central

    Jones, K.; Kim, K.; Patel, B.; Kelsen, S.; Braverman, A.; Swinton, D.; Gafken, P.; Jones, L.; Lane, W.; Neveu, J.; Leung, H.; Shaffer, S.; Leszyk, J.; Stanley, B.; Fox, T.; Stanley, A.; Yeung, Anthony

    2013-01-01

    Proteomic research can benefit from simultaneous access to multiple cutting-edge mass spectrometers. 18 core facilities responded to our investigators seeking service through the ABRF Discussion Forum. Five of the facilities selected completed four plasma proteomics experiments as routine fee-for-service. Each biological experiment entailed an iTRAQ 4-plex proteome comparison of immunodepleted plasma provided as 30 labeled-peptide fractions. Identical samples were analyzed by two AB SCIEX TripleTOF 5600 and three Thermo Orbitrap (Elite/Velos Pro/Q Exactive) instruments. 480 LC-MS/MS runs delivered >250 GB of data over two months. We compare herein routine service analyses of three peptide fractions of different peptide abundance. Data files from each instrument were studied to develop optimal analysis parameters to compare with default parameters in Mascot Distiller 2.4, ProteinPilot 4.5 beta, AB Sciex MS Data Converter 1.3 beta, and Proteome Discover 1.3. Peak-picking for TripleTOFs was best by ProteinPilot 4.5 beta while Mascot Distiller and Proteome Discoverer were comparable for the Orbitraps. We compared protein identification and quantitation in SwissProt 2012_07 database by Mascot Server 2.4.01 versus ProteinPilot. By all search methods, more proteins, up to two fold, were identified using the Q Exactive than others. Q Exactive excelled also at the number of unique significant peptide ion sequences. However, software-dependent impact on subsequent interpretation, due to peptide modifications, can be critical. These findings may have special implications for iTRAQ plasma proteomics. For the low abundance peptide ions, the slope of the dynamic range drop-off in the plasma proteome is uniquely sharp compared with cell lysates. Our study provides data for testable improvements in the operation of these mass spectrometers. More importantly, we have demonstrated a new affordable expedient workflow for investigators to perform proteomic experiments through the ABRF infrastructure. (We acknowledge John Cottrell for optimizing the peak-picking parameters for Mascot Distiller).

  2. Epsilon-Q: An Automated Analyzer Interface for Mass Spectral Library Search and Label-Free Protein Quantification.

    PubMed

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Paik, Young-Ki

    2017-12-01

    Mass spectrometry (MS) is a widely used proteome analysis tool for biomedical science. In an MS-based bottom-up proteomic approach to protein identification, sequence database (DB) searching has been routinely used because of its simplicity and convenience. However, searching a sequence DB with multiple variable modification options can increase processing time, false-positive errors in large and complicated MS data sets. Spectral library searching is an alternative solution, avoiding the limitations of sequence DB searching and allowing the detection of more peptides with high sensitivity. Unfortunately, this technique has less proteome coverage, resulting in limitations in the detection of novel and whole peptide sequences in biological samples. To solve these problems, we previously developed the "Combo-Spec Search" method, which uses manually multiple references and simulated spectral library searching to analyze whole proteomes in a biological sample. In this study, we have developed a new analytical interface tool called "Epsilon-Q" to enhance the functions of both the Combo-Spec Search method and label-free protein quantification. Epsilon-Q performs automatically multiple spectral library searching, class-specific false-discovery rate control, and result integration. It has a user-friendly graphical interface and demonstrates good performance in identifying and quantifying proteins by supporting standard MS data formats and spectrum-to-spectrum matching powered by SpectraST. Furthermore, when the Epsilon-Q interface is combined with the Combo-Spec search method, called the Epsilon-Q system, it shows a synergistic function by outperforming other sequence DB search engines for identifying and quantifying low-abundance proteins in biological samples. The Epsilon-Q system can be a versatile tool for comparative proteome analysis based on multiple spectral libraries and label-free quantification.

  3. Surface analysis of Dicrocoelium dendriticum. The molecular characterization of exosomes reveals the presence of miRNAs.

    PubMed

    Bernal, Dolores; Trelis, Maria; Montaner, Sergio; Cantalapiedra, Fernando; Galiano, Alicia; Hackenberg, Michael; Marcilla, Antonio

    2014-06-13

    With the aim of characterizing the molecules involved in the interaction of Dicrocoelium dendriticum adults and the host, we have performed proteomic analyses of the external surface of the parasite using the currently available datasets including the transcriptome of the related species Echinostoma caproni. We have identified 182 parasite proteins on the outermost surface of D. dendriticum. The presence of exosome-like vesicles in the ESP of D. dendriticum and their components has also been characterized. Using proteomic approaches, we have characterized 84 proteins in these vesicles. Interestingly, we have detected miRNA in D. dendriticum exosomes, thus representing the first report of miRNA in helminth exosomes. In order to identify potential targets for intervention against parasitic helminths, we have analyzed the surface of the parasitic helminth Dicrocoelium dendriticum. Along with the proteomic analyses of the outermost layer of the parasite, our work describes the molecular characterization of the exosomes of D. dendriticum. Our proteomic data confirm the improvement of protein identification from "non-model organisms" like helminths, when using different search engines against a combination of available databases. In addition, this work represents the first report of miRNAs in parasitic helminth exosomes. These vesicles can pack specific proteins and RNAs providing stability and resistance to RNAse digestion in body fluids, and provide a way to regulate host-parasite interplay. The present data should provide a solid foundation for the development of novel methods to control this non-model organism and related parasites. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Proteomic profiling of the planarian Schmidtea mediterranea and its mucous reveals similarities with human secretions and those predicted for parasitic flatworms.

    PubMed

    Bocchinfuso, Donald G; Taylor, Paul; Ross, Eric; Ignatchenko, Alex; Ignatchenko, Vladimir; Kislinger, Thomas; Pearson, Bret J; Moran, Michael F

    2012-09-01

    The freshwater planarian Schmidtea mediterranea has been used in research for over 100 years, and is an emerging stem cell model because of its capability of regenerating large portions of missing body parts. Exteriorly, planarians are covered in mucous secretions of unknown composition, implicated in locomotion, predation, innate immunity, and substrate adhesion. Although the planarian genome has been sequenced, it remains mostly unannotated, challenging both genomic and proteomic analyses. The goal of the current study was to annotate the proteome of the whole planarian and its mucous fraction. The S. mediterranea proteome was analyzed via mass spectrometry by using multidimensional protein identification technology with whole-worm tryptic digests. By using a proteogenomics approach, MS data were searched against an in silico translated planarian transcript database, and by using the Swiss-Prot BLAST algorithm to identify proteins similar to planarian queries. A total of 1604 proteins were identified. The mucous subproteome was defined through analysis of a mucous trail fraction and an extract obtained by treating whole worms with the mucolytic agent N-acetylcysteine. Gene Ontology analysis confirmed that the mucous fractions were enriched with secreted proteins. The S. mediterranea proteome is highly similar to that predicted for the trematode Schistosoma mansoni associated with intestinal schistosomiasis, with the mucous subproteome particularly highly conserved. Remarkably, orthologs of 119 planarian mucous proteins are present in human mucosal secretions and tear fluid. We suggest planarians have potential to be a model system for the characterization of mucous protein function and relevant to parasitic flatworm infections and diseases underlined by mucous aberrancies, such as cystic fibrosis, asthma, and other lung diseases.

  5. Tear film proteome in age-related macular degeneration.

    PubMed

    Winiarczyk, Mateusz; Kaarniranta, Kai; Winiarczyk, Stanisław; Adaszek, Łukasz; Winiarczyk, Dagmara; Mackiewicz, Jerzy

    2018-06-01

    Age-related macular degeneration (AMD) is the main reason for blindness in elderly people in the developed countries. Current screening protocols have limitations in detecting the early signs of retinal degeneration. Therefore, it would be desirable to find novel biomarkers for early detection of AMD. Development of novel biomarkers would help in the prevention, diagnostics, and treatment of AMD. Proteomic analysis of tear film has shown promise in this research area. If an optimal set of biomarkers could be obtained from accessible body fluids, it would represent a reliable way to monitor disease progression and response to novel therapies. Tear films were collected on Schirmer strips from a total of 22 patients (8 with wet AMD, 6 with dry AMD, and 8 control individuals). 2D electrophoresis was used to separate tear film proteins prior to their identification with matrix-assisted laser desorption/ionization time of flight spectrometer (MALDI-TOF/TOF) and matching with functional databases. A total of 342 proteins were identified. Most of them were previously described in various proteomic studies concerning AMD. Shootin-1, histatin-3, fidgetin-like protein 1, SRC kinase signaling inhibitor, Graves disease carrier protein, actin cytoplasmic 1, prolactin-inducible protein 1, and protein S100-A7A were upregulated in the tear film samples isolated from AMD patients and were not previously linked with this disease in any proteomic analysis. The upregulated proteins supplement our current knowledge of AMD pathogenesis, providing evidence that certain specific proteins are expressed into the tear film in AMD. As far we are aware, this is the first study to have undertaken a comprehensive in-depth analysis of the human tear film proteome in AMD patients.

  6. Detailed tail proteomic analysis of axolotl (Ambystoma mexicanum) using an mRNA-seq reference database.

    PubMed

    Demircan, Turan; Keskin, Ilknur; Dumlu, Seda Nilgün; Aytürk, Nilüfer; Avşaroğlu, Mahmut Erhan; Akgün, Emel; Öztürk, Gürkan; Baykal, Ahmet Tarık

    2017-01-01

    Salamander axolotl has been emerging as an important model for stem cell research due to its powerful regenerative capacity. Several advantages, such as the high capability of advanced tissue, organ, and appendages regeneration, promote axolotl as an ideal model system to extend our current understanding on the mechanisms of regeneration. Acknowledging the common molecular pathways between amphibians and mammals, there is a great potential to translate the messages from axolotl research to mammalian studies. However, the utilization of axolotl is hindered due to the lack of reference databases of genomic, transcriptomic, and proteomic data. Here, we introduce the proteome analysis of the axolotl tail section searched against an mRNA-seq database. We translated axolotl mRNA sequences to protein sequences and annotated these to process the LC-MS/MS data and identified 1001 nonredundant proteins. Functional classification of identified proteins was performed by gene ontology searches. The presence of some of the identified proteins was validated by in situ antibody labeling. Furthermore, we have analyzed the proteome expressional changes postamputation at three time points to evaluate the underlying mechanisms of the regeneration process. Taken together, this work expands the proteomics data of axolotl to contribute to its establishment as a fully utilized model. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. High-throughput Database Search and Large-scale Negative Polarity Liquid Chromatography–Tandem Mass Spectrometry with Ultraviolet Photodissociation for Complex Proteomic Samples*

    PubMed Central

    Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.

    2013-01-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches. PMID:23695934

  8. T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System.

    PubMed

    Tay, Daniel Ming Ming; Govindarajan, Kunde Ramamoorthy; Khan, Asif M; Ong, Terenze Yao Rui; Samad, Hanif M; Soh, Wei Wei; Tong, Minyan; Zhang, Fan; Tan, Tin Wee

    2010-10-15

    Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. Herein, we present T3SEdb http://effectors.bic.nus.edu.sg/T3SEdb, a specialized database of annotated T3SS effector (T3SE) sequences containing 1089 records from 46 bacterial species compiled from the literature and public protein databases. Procedures have been defined for i) comprehensive annotation of experimental status of effectors, ii) submission and curation review of records by users of the database, and iii) the regular update of T3SEdb existing and new records. Keyword fielded and sequence searches (BLAST, regular expression) are supported for both experimentally verified and hypothetical T3SEs. More than 171 clusters of T3SEs were detected based on sequence identity comparisons (intra-cluster difference up to ~60%). Owing to this high level of sequence diversity of T3SEs, the T3SEdb provides a large number of experimentally known effector sequences with wide species representation for creation of effector predictors. We created a reliable effector prediction tool, integrated into the database, to demonstrate the application of the database for such endeavours. T3SEdb is the first specialised database reported for T3SS effectors, enriched with manual annotations that facilitated systematic construction of a reliable prediction model for identification of novel effectors. The T3SEdb represents a platform for inclusion of additional annotations of metadata for future developments of sophisticated effector prediction models for screening and selection of putative novel effectors from bacterial genomes/proteomes that can be validated by a small number of key experiments.

  9. Proteomic profile of dormant Trichophyton Rubrum conidia

    PubMed Central

    Leng, Wenchuan; Liu, Tao; Li, Rui; Yang, Jian; Wei, Candong; Zhang, Wenliang; Jin, Qi

    2008-01-01

    Background Trichophyton rubrum is the most common dermatophyte causing fungal skin infections in humans. Asexual sporulation is an important means of propagation for T. rubrum, and conidia produced by this way are thought to be the primary cause of human infections. Despite their importance in pathogenesis, the conidia of T. rubrum remain understudied. We intend to intensively investigate the proteome of dormant T. rubrum conidia to characterize its molecular and cellular features and to enhance the development of novel therapeutic strategies. Results The proteome of T. rubrum conidia was analyzed by combining shotgun proteomics with sample prefractionation and multiple enzyme digestion. In total, 1026 proteins were identified. All identified proteins were compared to those in the NCBI non-redundant protein database, the eukaryotic orthologous groups database, and the gene ontology database to obtain functional annotation information. Functional classification revealed that the identified proteins covered nearly all major biological processes. Some proteins were spore specific and related to the survival and dispersal of T. rubrum conidia, and many proteins were important to conidial germination and response to environmental conditions. Conclusion Our results suggest that the proteome of T. rubrum conidia is considerably complex, and that the maintenance of conidial dormancy is an intricate and elaborate process. This data set provides the first global framework for the dormant T. rubrum conidia proteome and is a stepping stone on the way to further study of the molecular mechanisms of T. rubrum conidial germination and the maintenance of conidial dormancy. PMID:18578874

  10. Separation and identification of Musa acuminate Colla (banana) leaf proteins by two-dimensional gel electrophoresis and mass spectrometry.

    PubMed

    Lu, Y; Qi, Y X; Zhang, H; Zhang, H Q; Pu, J J; Xie, Y X

    2013-12-19

    To establish a proteomic reference map of Musa acuminate Colla (banana) leaf, we separated and identified leaf proteins using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and mass spectrometry (MS). Tryptic digests of 44 spots were subjected to peptide mass fingerprinting (PMF) by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS. Three spots that were not identified by MALDI-TOF MS analysis were identified by searching against the NCBInr, SwissProt, and expressed sequence tag (EST) databases. We identified 41 unique proteins. The majority of the identified leaf proteins were found to be involved in energy metabolism. The results indicate that 2D-PAGE is a sensitive and powerful technique for the separation and identification of Musa leaf proteins. A summary of the identified proteins and their putative functions is discussed.

  11. Proteomics for understanding miRNA biology.

    PubMed

    Huang, Tai-Chung; Pinto, Sneha M; Pandey, Akhilesh

    2013-02-01

    MicroRNAs (miRNAs) are small noncoding RNAs that play important roles in posttranscriptional regulation of gene expression. Mature miRNAs associate with the RNA interference silencing complex to repress mRNA translation and/or degrade mRNA transcripts. Mass spectrometry-based proteomics has enabled identification of several core components of the canonical miRNA processing pathway and their posttranslational modifications which are pivotal in miRNA regulatory mechanisms. The use of quantitative proteomic strategies has also emerged as a key technique for experimental identification of miRNA targets by allowing direct determination of proteins whose levels are altered because of translational suppression. This review focuses on the role of proteomics and labeling strategies to understand miRNA biology. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer

    PubMed Central

    Gupta, Sudheer; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Kumar, Rahul; Kumar, Shailesh; Sehgal, Manika; Nagpal, Gandharva

    2016-01-01

    Due to advancement in sequencing technology, genomes of thousands of cancer tissues or cell-lines have been sequenced. Identification of cancer-specific epitopes or neoepitopes from cancer genomes is one of the major challenges in the field of immunotherapy or vaccine development. This paper describes a platform Cancertope, developed for designing genome-based immunotherapy or vaccine against a cancer cell. Broadly, the integrated resources on this platform are apportioned into three precise sections. First section explains a cancer-specific database of neoepitopes generated from genome of 905 cancer cell lines. This database harbors wide range of epitopes (e.g., B-cell, CD8+ T-cell, HLA class I, HLA class II) against 60 cancer-specific vaccine antigens. Second section describes a partially personalized module developed for predicting potential neoepitopes against a user-specific cancer genome. Finally, we describe a fully personalized module developed for identification of neoepitopes from genomes of cancerous and healthy cells of a cancer-patient. In order to assist the scientific community, wide range of tools are incorporated in this platform that includes screening of epitopes against human reference proteome (http://www.imtech.res.in/raghava/cancertope/). PMID:27832200

  13. Identification of beta-Lactamases and beta-Lactam-Related Proteins in Human Pathogenic Bacteria using a Computational Search Approach.

    PubMed

    Brambila-Tapia, Aniel Jessica Leticia; Perez-Rueda, Ernesto; Barrios, Humberto; Dávalos-Rodríguez, Nory Omayra; Dávalos-Rodríguez, Ingrid Patricia; Cardona-Muñoz, Ernesto Germán; Salazar-Páramo, Mario

    2017-08-01

    A systematic analysis of beta-lactamases based on comparative proteomics has not been performed thus far. In this report, we searched for the presence of beta-lactam-related proteins in 591 bacterial proteomes belonging to 52 species that are pathogenic to humans. The amino acid sequences for 19 different types of beta-lactamases (ACT, CARB, CifA, CMY, CTX, FOX, GES, GOB, IMP, IND, KPC, LEN, OKP, OXA, OXY, SHV, TEM, NDM, and VIM) were obtained from the ARG-ANNOT database and were used to construct 19 HMM profiles, which were used to identify potential beta-lactamases in the completely sequenced bacterial proteomes. A total of 2877 matches that included the word "beta-lactamase" and/or "penicillin" in the functional annotation and/or in any of its regions were obtained. These enzymes were mainly described as "penicillin-binding proteins," "beta-lactamases," and "metallo-beta-lactamases" and were observed in 47 of the 52 species studied. In addition, proteins classified as "beta-lactamases" were observed in 39 of the species included. A positive correlation between the number of beta-lactam-related proteins per species and the proteome size was observed (R 0.78, P < 0.00001). This correlation partially explains the high presence of beta-lactam-related proteins in large proteomes, such as Nocardia brasiliensis, Bacillus anthracis, and Mycobacterium tuberculosis, along with their absence in small proteomes, such as Chlamydia spp. and Mycoplasma spp. We detected only five types of beta-lactamases (TEM, SHV, CTX, IMP, and OXA) and other related proteins in particular species that corresponded with those reported in the literature. We additionally detected other potential species-specific beta-lactamases that have not yet been reported. In the future, better results will be achieved due to more accurate sequence annotations and a greater number of sequenced genomes.

  14. Proteomic analysis of the cyanobacterium of the Azolla symbiosis: identity, adaptation, and NifH modification.

    PubMed

    Ekman, Martin; Tollbäck, Petter; Bergman, Birgitta

    2008-01-01

    Cyanobacteria are able to form stable nitrogen-fixing symbioses with diverse eukaryotes. To extend our understanding of adaptations imposed by plant hosts, two-dimensional gel electrophoresis and mass spectrometry (MS) were used for comparative protein expression profiling of a cyanobacterium (cyanobiont) dwelling in leaf cavities of the water-fern Azolla filiculoides. Homology-based protein identification using peptide mass fingerprinting [matrix-assisted laser desorption ionization-time of flight (MALDI-TOF-MS)], tandem MS analyses, and sequence homology searches resulted in an identification success rate of 79% of proteins analysed in the unsequenced cyanobiont. Compared with a free-living strain, processes related to energy production, nitrogen and carbon metabolism, and stress-related functions were up-regulated in the cyanobiont while photosynthesis and metabolic turnover rates were down-regulated, stressing a slow heterotrophic mode of growth, as well as high heterocyst frequencies and nitrogen-fixing capacities. The first molecular data set on the nature of the NifH post-translational modification in cyanobacteria was also obtained: peptide mass spectra of the protein demonstrated the presence of a 300-400 Da protein modification localized to a specific 13 amino acid sequence, within the part of the protein that is ADP-ribosylated in other bacteria and close to the active site of nitrogenase. Furthermore, the distribution of the highest scoring database hits for the identified proteins points to the possibility of using proteomic data in taxonomy.

  15. Large-Scale Chemical Similarity Networks for Target Profiling of Compounds Identified in Cell-Based Chemical Screens

    PubMed Central

    Lo, Yu-Chen; Senese, Silvia; Li, Chien-Ming; Hu, Qiyang; Huang, Yong; Damoiseaux, Robert; Torres, Jorge Z.

    2015-01-01

    Target identification is one of the most critical steps following cell-based phenotypic chemical screens aimed at identifying compounds with potential uses in cell biology and for developing novel disease therapies. Current in silico target identification methods, including chemical similarity database searches, are limited to single or sequential ligand analysis that have limited capabilities for accurate deconvolution of a large number of compounds with diverse chemical structures. Here, we present CSNAP (Chemical Similarity Network Analysis Pulldown), a new computational target identification method that utilizes chemical similarity networks for large-scale chemotype (consensus chemical pattern) recognition and drug target profiling. Our benchmark study showed that CSNAP can achieve an overall higher accuracy (>80%) of target prediction with respect to representative chemotypes in large (>200) compound sets, in comparison to the SEA approach (60–70%). Additionally, CSNAP is capable of integrating with biological knowledge-based databases (Uniprot, GO) and high-throughput biology platforms (proteomic, genetic, etc) for system-wise drug target validation. To demonstrate the utility of the CSNAP approach, we combined CSNAP's target prediction with experimental ligand evaluation to identify the major mitotic targets of hit compounds from a cell-based chemical screen and we highlight novel compounds targeting microtubules, an important cancer therapeutic target. The CSNAP method is freely available and can be accessed from the CSNAP web server (http://services.mbi.ucla.edu/CSNAP/). PMID:25826798

  16. Proteomic biomarkers for ovarian cancer risk in women with polycystic ovary syndrome: a systematic review and biomarker database integration.

    PubMed

    Galazis, Nicolas; Olaleye, Olalekan; Haoula, Zeina; Layfield, Robert; Atiomo, William

    2012-12-01

    To review and identify possible biomarkers for ovarian cancer (OC) in women with polycystic ovary syndrome (PCOS). Systematic literature searches of MEDLINE, EMBASE, and Cochrane using the search terms "proteomics," "proteomic," and "ovarian cancer" or "ovarian carcinoma." Proteomic biomarkers for OC were then integrated with an updated previously published database of all proteomic biomarkers identified to date in patients with PCOS. Academic department of obstetrics and gynecology in the United Kingdom. A total of 180 women identified in the six studies. Tissue samples from women with OC vs. tissue samples from women without OC. Proteomic biomarkers, proteomic technique used, and methodologic quality score. A panel of six biomarkers was overexpressed both in women with OC and in women with PCOS. These biomarkers include calreticulin, fibrinogen-γ, superoxide dismutase, vimentin, malate dehydrogenase, and lamin B2. These biomarkers could help improve our understanding of the links between PCOS and OC and could potentially be used to identify subgroups of women with PCOS at increased risk of OC. More studies are required to further evaluate the role these biomarkers play in women with PCOS and OC. Copyright © 2012 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  17. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  18. Mass Defect Labeling of Cysteine for Improving Peptide Assignment in Shotgun Proteomic Analyses

    PubMed Central

    Hernandez, Hilda; Niehauser, Sarah; Boltz, Stacey A.; Gawandi, Vijay; Phillips, Robert S.; Amster, I. Jonathan

    2006-01-01

    A method for improving the identification of peptides in a shotgun proteome analysis using accurate mass measurement has been developed. The improvement is based upon the derivatization of cysteine residues with a novel reagent, 2,4-dibromo-(2′-iodo)acetanilide. The derivitization changes the mass defect of cysteine-containing proteolytic peptides in a manner that increases their identification specificity. Peptide masses were measured using matrix-assisted laser desorption/ionization Fourier transform ion cyclotron mass spectrometry. Reactions with protein standards show that the derivatization of cysteine is rapid and quantitative, and the data suggest that the derivatized peptides are more easily ionized or detected than unlabeled cysteine-containing peptides. The reagent was tested on a 15N-metabolically labeled proteome from M. maripaludis. Proteins were identified by their accurate mass values and from their nitrogen stoichiometry. A total of 47% of the labeled peptides are identified versus 27% for the unlabeled peptides. This procedure permits the identification of proteins from the M. maripaludis proteome that are not usually observed by the standard protocol and shows that better protein coverage is obtained with this methodology. PMID:16689545

  19. Proteomic analysis of bovine nucleolus.

    PubMed

    Patel, Amrutlal K; Olson, Doug; Tikoo, Suresh K

    2010-09-01

    Nucleolus is the most prominent subnuclear structure, which performs a wide variety of functions in the eukaryotic cellular processes. In order to understand the structural and functional role of the nucleoli in bovine cells, we analyzed the proteomic composition of the bovine nucleoli. The nucleoli were isolated from Madin Darby bovine kidney cells and subjected to proteomic analysis by LC-MS/MS after fractionation by SDS-PAGE and strong cation exchange chromatography. Analysis of the data using the Mascot database search and the GPM database search identified 311 proteins in the bovine nucleoli, which contained 22 proteins previously not identified in the proteomic analysis of human nucleoli. Analysis of the identified proteins using the GoMiner software suggested that the bovine nucleoli contained proteins involved in ribosomal biogenesis, cell cycle control, transcriptional, translational and post-translational regulation, transport, and structural organization. Copyright © 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

  20. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

    PubMed

    Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf

    2004-02-01

    A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

  1. Respiratory Proteomics Today: Are Technological Advances for the Identification of Biomarker Signatures Catching up with Their Promise? A Critical Review of the Literature in the Decade 2004-2013.

    PubMed

    Viglio, Simona; Stolk, Jan; Iadarola, Paolo; Giuliano, Serena; Luisetti, Maurizio; Salvini, Roberta; Fumagalli, Marco; Bardoni, Anna

    2014-01-22

    To improve the knowledge on a variety of severe disorders, research has moved from the analysis of individual proteins to the investigation of all proteins expressed by a tissue/organism. This global proteomic approach could prove very useful: (i) for investigating the biochemical pathways involved in disease; (ii) for generating hypotheses; or (iii) as a tool for the identification of proteins differentially expressed in response to the disease state. Proteomics has not been used yet in the field of respiratory research as extensively as in other fields, only a few reproducible and clinically applicable molecular markers, which can assist in diagnosis, having been currently identified. The continuous advances in both instrumentation and methodology, which enable sensitive and quantitative proteomic analyses in much smaller amounts of biological material than before, will hopefully promote the identification of new candidate biomarkers in this area. The aim of this report is to critically review the application over the decade 2004-2013 of very sophisticated technologies to the study of respiratory disorders. The observed changes in protein expression profiles from tissues/fluids of patients affected by pulmonary disorders opens the route for the identification of novel pathological mediators of these disorders.

  2. An object model and database for functional genomics.

    PubMed

    Jones, Andrew; Hunt, Ela; Wastling, Jonathan M; Pizarro, Angel; Stoeckert, Christian J

    2004-07-10

    Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site.

  3. Peptide Identification by Database Search of Mixture Tandem Mass Spectra*

    PubMed Central

    Wang, Jian; Bourne, Philip E.; Bandeira, Nuno

    2011-01-01

    In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision. PMID:21862760

  4. A comparative proteomics method for multiple samples based on a 18O-reference strategy and a quantitation and identification-decoupled strategy.

    PubMed

    Wang, Hongbin; Zhang, Yongqian; Gui, Shuqi; Zhang, Yong; Lu, Fuping; Deng, Yulin

    2017-08-15

    Comparisons across large numbers of samples are frequently necessary in quantitative proteomics. Many quantitative methods used in proteomics are based on stable isotope labeling, but most of these are only useful for comparing two samples. For up to eight samples, the iTRAQ labeling technique can be used. For greater numbers of samples, the label-free method has been used, but this method was criticized for low reproducibility and accuracy. An ingenious strategy has been introduced, comparing each sample against a 18 O-labeled reference sample that was created by pooling equal amounts of all samples. However, it is necessary to use proportion-known protein mixtures to investigate and evaluate this new strategy. Another problem for comparative proteomics of multiple samples is the poor coincidence and reproducibility in protein identification results across samples. In present study, a method combining 18 O-reference strategy and a quantitation and identification-decoupled strategy was investigated with proportion-known protein mixtures. The results obviously demonstrated that the 18 O-reference strategy had greater accuracy and reliability than other previously used comparison methods based on transferring comparison or label-free strategies. By the decoupling strategy, the quantification data acquired by LC-MS and the identification data acquired by LC-MS/MS are matched and correlated to identify differential expressed proteins, according to retention time and accurate mass. This strategy made protein identification possible for all samples using a single pooled sample, and therefore gave a good reproducibility in protein identification across multiple samples, and allowed for optimizing peptide identification separately so as to identify more proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Proteomic analysis of blue light-induced twining response in Cuscuta australis.

    PubMed

    Li, Dongxiao; Wang, Liangjiang; Yang, Xiaopo; Zhang, Guoguang; Chen, Liang

    2010-01-01

    The parasitic plant Cuscuta australis (dodder) invades a variety of species by entwining the stem and leaves of a host and developing haustoria. The twining response prior to haustoria formation is regarded as the first sign for dodders to parasitize host plants, and thus has been the focus of studies on the host-parasite interaction. However, the molecular mechanism is still poorly understood. In the present work, we have investigated the different effects of blue and white light on the twining response, and identified a set of proteins that were differentially expressed in dodder seedlings using a proteomic approach. Approximately 1,800 protein spots were detected on each 2-D gel, and 47 spots with increased or decreased protein levels were selected and analyzed with MALDI-TOF-MS. Peptide mass fingerprints (PMFs) obtained for these spots were used for protein identification through cross-species database searches. The results suggest that the blue light-induced twining response in dodder seedlings may be mediated by proteins involved in light signal transduction, cell wall degradation, cell structure, and metabolism.

  6. Mass Spectrometric Identification and Differentiation of Botulinum Neurotoxins through Toxin Proteomics.

    PubMed

    Kalb, Suzanne R; Barr, John R

    2013-08-01

    Botulinum neurotoxins (BoNTs) cause the disease botulism, which can be lethal if untreated. There are seven known serotypes of BoNT, A-G, defined by their response to antisera. Many serotypes are distinguished into differing subtypes based on amino acid sequence and immunogenic properties, and some subtypes are further differentiated into toxin variants. Toxin characterization is important as different types of BoNT can respond differently to medical countermeasures for botulism, and characterization of the toxin can aid in epidemiologic and forensic investigations. Proteomic techniques have been established to determine the serotype, subtype, or toxin variant of BoNT. These techniques involve digestion of the toxin into peptides, tandem mass spectrometric (MS/MS) analysis of the peptides, and database searching to identify the BoNT protein. These techniques demonstrate the capability to detect BoNT and its neurotoxin-associated proteins, and differentiate the toxin from other toxins which are up to 99.9% identical in some cases. This differentiation can be accomplished from toxins present in a complex matrix such as stool, food, or bacterial cultures and no DNA is required.

  7. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification.

    PubMed

    Fu, Shuyue; Liu, Xiang; Luo, Maochao; Xie, Ke; Nice, Edouard C; Zhang, Haiyuan; Huang, Canhua

    2017-04-01

    Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.

  8. Comprehensive data resources and analytical tools for pathological association of aminoacyl tRNA synthetases with cancer

    PubMed Central

    Lee, Ji-Hyun; You, Sungyong; Hyeon, Do Young; Kang, Byeongsoo; Kim, Hyerim; Park, Kyoung Mii; Han, Byungwoo; Hwang, Daehee; Kim, Sunghoon

    2015-01-01

    Mammalian cells have cytoplasmic and mitochondrial aminoacyl-tRNA synthetases (ARSs) that catalyze aminoacylation of tRNAs during protein synthesis. Despite their housekeeping functions in protein synthesis, recently, ARSs and ARS-interacting multifunctional proteins (AIMPs) have been shown to play important roles in disease pathogenesis through their interactions with disease-related molecules. However, there are lacks of data resources and analytical tools that can be used to examine disease associations of ARS/AIMPs. Here, we developed an Integrated Database for ARSs (IDA), a resource database including cancer genomic/proteomic and interaction data of ARS/AIMPs. IDA includes mRNA expression, somatic mutation, copy number variation and phosphorylation data of ARS/AIMPs and their interacting proteins in various cancers. IDA further includes an array of analytical tools for exploration of disease association of ARS/AIMPs, identification of disease-associated ARS/AIMP interactors and reconstruction of ARS-dependent disease-perturbed network models. Therefore, IDA provides both comprehensive data resources and analytical tools for understanding potential roles of ARS/AIMPs in cancers. Database URL: http://ida.biocon.re.kr/, http://ars.biocon.re.kr/ PMID:25824651

  9. Challenges of the information age: the impact of false discovery on pathway identification.

    PubMed

    Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

    2012-11-21

    Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.

  10. Top-down Proteomics: Technology Advancements and Applications to Heart Diseases

    PubMed Central

    Cai, Wenxuan; Tucholski, Trisha M.; Gregorich, Zachery R.; Ge, Ying

    2016-01-01

    Introduction Diseases of the heart are a leading cause of morbidity and mortality for both men and women worldwide, and impose significant economic burdens on the healthcare systems. Despite substantial effort over the last several decades, the molecular mechanisms underlying diseases of the heart remain poorly understood. Areas covered Altered protein post-translational modifications (PTMs) and protein isoform switching are increasingly recognized as important disease mechanisms. Top-down high-resolution mass spectrometry (MS)-based proteomics has emerged as the most powerful method for the comprehensive analysis of PTMs and protein isoforms. Here, we will review recent technology developments in the field of top-down proteomics, as well as highlight recent studies utilizing top-down proteomics to decipher the cardiac proteome for the understanding of the molecular mechanisms underlying diseases of the heart. Expert commentary Top-down proteomics is a premier method for the global and comprehensive study of protein isoforms and their PTMs, enabling the identification of novel protein isoforms and PTMs, characterization of sequence variations, and quantification of disease-associated alterations. Despite significant challenges, continuous development of top-down proteomics technology will greatly aid the dissection of the molecular mechanisms underlying diseases of the hearts for the identification of novel biomarkers and therapeutic targets. PMID:27448560

  11. Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics

    PubMed Central

    Zhou, Tao; Zhou, Zuo-Min; Guo, Xue-Jiang

    2013-01-01

    Proteomics strategies have been widely used in the field of male reproduction, both in basic and clinical research. Bioinformatics methods are indispensable in proteomics-based studies and are used for data presentation, database construction and functional annotation. In the present review, we focus on the functional annotation of gene lists obtained through qualitative or quantitative methods, summarizing the common and male reproduction specialized proteomics databases. We introduce several integrated tools used to find the hidden biological significance from the data obtained. We further describe in detail the information on male reproduction derived from Gene Ontology analyses, pathway analyses and biomedical analyses. We provide an overview of bioinformatics annotations in spermatogenesis, from gene function to biological function and from biological function to clinical application. On the basis of recently published proteomics studies and associated data, we show that bioinformatics methods help us to discover drug targets for sperm motility and to scan for cancer-testis genes. In addition, we summarize the online resources relevant to male reproduction research for the exploration of the regulation of spermatogenesis. PMID:23852026

  12. Mitochondrial proteomic profile of complex IV deficiency fibroblasts: rearrangement of oxidative phosphorylation complex/supercomplex and other metabolic pathways.

    PubMed

    Salvador-Severo, Karina; Gómez-Caudillo, Leopoldo; Quezada, Héctor; García-Trejo, José de Jesús; Cárdenas-Conejo, Alan; Vázquez-Memije, Martha Elisa; Minauro-Sanmiguel, Fernando

    Mitochondriopathies are multisystem diseases affecting the oxidative phosphorylation (OXPHOS) system. Skin fibroblasts are a good model for the study of these diseases. Fibroblasts with a complex IV mitochondriopathy were used to determine the molecular mechanism and the main affected functions in this disease. Skin fibroblast were grown to assure disease phenotype. Mitochondria were isolated from these cells and their proteome extracted for protein identification. Identified proteins were validated with the MitoMiner database. Disease phenotype was corroborated on skin fibroblasts, which presented a complex IV defect. The mitochondrial proteome of these cells showed that the most affected proteins belonged to the OXPHOS system, mainly to the complexes that form supercomplexes or respirosomes (I, III, IV, and V). Defects in complex IV seemed to be due to assembly issues, which might prevent supercomplexes formation and efficient substrate channeling. It was also found that this mitochondriopathy affects other processes that are related to DNA genetic information flow (replication, transcription, and translation) as well as beta oxidation and tricarboxylic acid cycle. These data, as a whole, could be used for the better stratification of these diseases, as well as to optimize management and treatment options. Copyright © 2017 Hospital Infantil de México Federico Gómez. Publicado por Masson Doyma México S.A. All rights reserved.

  13. Targeted quantitative analysis of Streptococcus pyogenes virulence factors by multiple reaction monitoring.

    PubMed

    Lange, Vinzenz; Malmström, Johan A; Didion, John; King, Nichole L; Johansson, Björn P; Schäfer, Juliane; Rameseder, Jonathan; Wong, Chee-Hong; Deutsch, Eric W; Brusniak, Mi-Youn; Bühlmann, Peter; Björck, Lars; Domon, Bruno; Aebersold, Ruedi

    2008-08-01

    In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure.

  14. A Review and Database of Snake Venom Proteomes

    PubMed Central

    Tasoulis, Theo

    2017-01-01

    Advances in the last decade combining transcriptomics with established proteomics methods have made possible rapid identification and quantification of protein families in snake venoms. Although over 100 studies have been published, the value of this information is increased when it is collated, allowing rapid assimilation and evaluation of evolutionary trends, geographical variation, and possible medical implications. This review brings together all compositional studies of snake venom proteomes published in the last decade. Compositional studies were identified for 132 snake species: 42 from 360 (12%) Elapidae (elapids), 20 from 101 (20%) Viperinae (true vipers), 65 from 239 (27%) Crotalinae (pit vipers), and five species of non-front-fanged snakes. Approximately 90% of their total venom composition consisted of eight protein families for elapids, 11 protein families for viperines and ten protein families for crotalines. There were four dominant protein families: phospholipase A2s (the most common across all front-fanged snakes), metalloproteases, serine proteases and three-finger toxins. There were six secondary protein families: cysteine-rich secretory proteins, l-amino acid oxidases, kunitz peptides, C-type lectins/snaclecs, disintegrins and natriuretic peptides. Elapid venoms contained mostly three-finger toxins and phospholipase A2s and viper venoms metalloproteases, phospholipase A2s and serine proteases. Although 63 protein families were identified, more than half were present in <5% of snake species studied and always in low abundance. The importance of these minor component proteins remains unknown. PMID:28927001

  15. Combined Mass Spectrometry Imaging and Top-down Microproteomics Reveals Evidence of a Hidden Proteome in Ovarian Cancer.

    PubMed

    Delcourt, Vivian; Franck, Julien; Leblanc, Eric; Narducci, Fabrice; Robin, Yves-Marie; Gimeno, Jean-Pascal; Quanico, Jusal; Wisztorski, Maxence; Kobeissy, Firas; Jacques, Jean-François; Roucou, Xavier; Salzet, Michel; Fournier, Isabelle

    2017-07-01

    Recently, it was demonstrated that proteins can be translated from alternative open reading frames (altORFs), increasing the size of the actual proteome. Top-down mass spectrometry-based proteomics allows the identification of intact proteins containing post-translational modifications (PTMs) as well as truncated forms translated from reference ORFs or altORFs. Top-down tissue microproteomics was applied on benign, tumor and necrotic-fibrotic regions of serous ovarian cancer biopsies, identifying proteins exhibiting region-specific cellular localization and PTMs. The regions of interest (ROIs) were determined by MALDI mass spectrometry imaging and spatial segmentation. Analysis with a customized protein sequence database containing reference and alternative proteins (altprots) identified 15 altprots, including alternative G protein nucleolar 1 (AltGNL1) found in the tumor, and translated from an altORF nested within the GNL1 canonical coding sequence. Co-expression of GNL1 and altGNL1 was validated by transfection in HEK293 and HeLa cells with an expression plasmid containing a GNL1-FLAG (V5) construct. Western blot and immunofluorescence experiments confirmed constitutive co-expression of altGNL1-V5 with GNL1-FLAG. Taken together, our approach provides means to evaluate protein changes in the case of serous ovarian cancer, allowing the detection of potential markers that have never been considered. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

  16. mzResults: An Interactive Viewer for Interrogation and Distribution of Proteomics Results*

    PubMed Central

    Webber, James T.; Askenazi, Manor; Marto, Jarrod A.

    2011-01-01

    The growing use of mass spectrometry in the context of biomedical research has been accompanied by an increased demand for distribution of results in a format that facilitates rapid and efficient validation of claims by reviewers and other interested parties. However, the continued evolution of mass spectrometry hardware, sample preparation methods, and peptide identification algorithms complicates standardization and creates hurdles related to compliance with journal submission requirements. Moreover, the recently announced Philadelphia Guidelines (1, 2) suggest that authors provide native mass spectrometry data files in support of their peer-reviewed research articles. These trends highlight the need for data viewers and other tools that work independently of manufacturers' proprietary data systems and seamlessly connect proteomics results with original data files to support user-driven data validation and review. Based upon our recently described API1-based framework for mass spectrometry data analysis (3, 4), we created an interactive viewer (mzResults) that is built on established database standards and enables efficient distribution and interrogation of results associated with proteomics experiments, while also providing a convenient mechanism for authors to comply with data submission standards as described in the Philadelphia Guidelines. In addition, the architecture of mzResults supports in-depth queries of the native mass spectrometry files through our multiplierz software environment. We use phosphoproteomics data to illustrate the features and capabilities of mzResults. PMID:21266631

  17. Proteomics analysis of "Rovabiot Excel", a secreted protein cocktail from the filamentous fungus Penicillium funiculosum grown under industrial process fermentation.

    PubMed

    Guais, Olivier; Borderies, Gisèle; Pichereaux, Carole; Maestracci, Marc; Neugnot, Virginie; Rossignol, Michel; François, Jean Marie

    2008-12-01

    MS/MS techniques are well customized now for proteomic analysis, even for non-sequenced organisms, since peptide sequences obtained by these methods can be matched with those found in databases from closely related sequenced organisms. We used this approach to characterize the protein content of the "Rovabio Excel", an enzymatic cocktail produced by Penicillium funiculosum that is used as feed additive in animal nutrition. Protein separation by bi-dimensional electrophoresis yielded more than 100 spots, from which 37 proteins were unambiguously assigned from peptide sequences. By one-dimensional SDS-gel electrophoresis, 34 proteins were identified among which 8 were not found in the 2-DE analysis. A third method, termed 'peptidic shotgun', which consists in a direct treatment of the cocktail by trypsin followed by separation of the peptides on two-dimensional liquid chromatography, resulted in the identification of two additional proteins not found by the two other methods. Altogether, more than 50 proteins, among which several glycosylhydrolytic, hemicellulolytic and proteolytic enzymes, were identified by combining three separation methods in this enzymatic cocktail. This work confirmed the power of proteome analysis to explore the genome expression of a non-sequenced fungus by taking advantage of sequences from phylogenetically related filamentous fungi and pave the way for further functional analysis of P. funiculosum.

  18. The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

    PubMed Central

    Dellaire, G.; Farrall, R.; Bickmore, W.A.

    2003-01-01

    The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015

  19. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  20. A Community Standard Format for the Representation of Protein Affinity Reagents*

    PubMed Central

    Gloriam, David E.; Orchard, Sandra; Bertinetti, Daniela; Björling, Erik; Bongcam-Rudloff, Erik; Borrebaeck, Carl A. K.; Bourbeillon, Julie; Bradbury, Andrew R. M.; de Daruvar, Antoine; Dübel, Stefan; Frank, Ronald; Gibson, Toby J.; Gold, Larry; Haslam, Niall; Herberg, Friedrich W.; Hiltke, Tara; Hoheisel, Jörg D.; Kerrien, Samuel; Koegl, Manfred; Konthur, Zoltán; Korn, Bernhard; Landegren, Ulf; Montecchi-Palazzi, Luisa; Palcy, Sandrine; Rodriguez, Henry; Schweinsberg, Sonja; Sievert, Volker; Stoevesandt, Oda; Taussig, Michael J.; Ueffing, Marius; Uhlén, Mathias; van der Maarel, Silvère; Wingren, Christer; Woollard, Peter; Sherman, David J.; Hermjakob, Henning

    2010-01-01

    Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site. PMID:19674966

  1. Changes in the Proteome of Xylem Sap in Brassica oleracea in Response to Fusarium oxysporum Stress

    PubMed Central

    Pu, Zijing; Ino, Yoko; Kimura, Yayoi; Tago, Asumi; Shimizu, Motoki; Natsume, Satoshi; Sano, Yoshitaka; Fujimoto, Ryo; Kaneko, Kentaro; Shea, Daniel J.; Fukai, Eigo; Fuji, Shin-Ichi; Hirano, Hisashi; Okazaki, Keiichi

    2016-01-01

    Fusarium oxysporum f.sp. conlutinans (Foc) is a serious root-invading and xylem-colonizing fungus that causes yellowing in Brassica oleracea. To comprehensively understand the interaction between F. oxysporum and B. oleracea, composition of the xylem sap proteome of the non-infected and Foc-infected plants was investigated in both resistant and susceptible cultivars using liquid chromatography-tandem mass spectrometry (LC-MS/MS) after in-solution digestion of xylem sap proteins. Whole genome sequencing of Foc was carried out and generated a predicted Foc protein database. The predicted Foc protein database was then combined with the public B. oleracea and B. rapa protein databases downloaded from Uniprot and used for protein identification. About 200 plant proteins were identified in the xylem sap of susceptible and resistant plants. Comparison between the non-infected and Foc-infected samples revealed that Foc infection causes changes to the protein composition in B. oleracea xylem sap where repressed proteins accounted for a greater proportion than those of induced in both the susceptible and resistant reactions. The analysis on the proteins with concentration change > = 2-fold indicated a large portion of up- and down-regulated proteins were those acting on carbohydrates. Proteins with leucine-rich repeats and legume lectin domains were mainly induced in both resistant and susceptible system, so was the case of thaumatins. Twenty-five Foc proteins were identified in the infected xylem sap and 10 of them were cysteine-containing secreted small proteins that are good candidates for virulence and/or avirulence effectors. The findings of differential response of protein contents in the xylem sap between the non-infected and Foc-infected samples as well as the Foc candidate effectors secreted in xylem provide valuable insights into B. oleracea-Foc interactions. PMID:26870056

  2. Changes in the Proteome of Xylem Sap in Brassica oleracea in Response to Fusarium oxysporum Stress.

    PubMed

    Pu, Zijing; Ino, Yoko; Kimura, Yayoi; Tago, Asumi; Shimizu, Motoki; Natsume, Satoshi; Sano, Yoshitaka; Fujimoto, Ryo; Kaneko, Kentaro; Shea, Daniel J; Fukai, Eigo; Fuji, Shin-Ichi; Hirano, Hisashi; Okazaki, Keiichi

    2016-01-01

    Fusarium oxysporum f.sp. conlutinans (Foc) is a serious root-invading and xylem-colonizing fungus that causes yellowing in Brassica oleracea. To comprehensively understand the interaction between F. oxysporum and B. oleracea, composition of the xylem sap proteome of the non-infected and Foc-infected plants was investigated in both resistant and susceptible cultivars using liquid chromatography-tandem mass spectrometry (LC-MS/MS) after in-solution digestion of xylem sap proteins. Whole genome sequencing of Foc was carried out and generated a predicted Foc protein database. The predicted Foc protein database was then combined with the public B. oleracea and B. rapa protein databases downloaded from Uniprot and used for protein identification. About 200 plant proteins were identified in the xylem sap of susceptible and resistant plants. Comparison between the non-infected and Foc-infected samples revealed that Foc infection causes changes to the protein composition in B. oleracea xylem sap where repressed proteins accounted for a greater proportion than those of induced in both the susceptible and resistant reactions. The analysis on the proteins with concentration change > = 2-fold indicated a large portion of up- and down-regulated proteins were those acting on carbohydrates. Proteins with leucine-rich repeats and legume lectin domains were mainly induced in both resistant and susceptible system, so was the case of thaumatins. Twenty-five Foc proteins were identified in the infected xylem sap and 10 of them were cysteine-containing secreted small proteins that are good candidates for virulence and/or avirulence effectors. The findings of differential response of protein contents in the xylem sap between the non-infected and Foc-infected samples as well as the Foc candidate effectors secreted in xylem provide valuable insights into B. oleracea-Foc interactions.

  3. Global analysis of Brucella melitensis proteomes.

    PubMed

    Mujer, Cesar V; Wagner, Mary Ann; Eschenbrenner, Michel; Horn, Troy; Kraycer, Jo Ann; Redkar, Rajendra; Hagius, Sue; Elzer, Philip; Delvecchio, Vito G

    2002-10-01

    Brucella melitensis is a facultative, intracellular, gram-negative cocco-bacillus that causes Malta fever in humans and brucellosis in animals. There are at least six species in the genus, and the disease is classified as zoonotic because several species infect humans. Using 2-D gel electrophoresis and mass spectrometry, we have initiated (i) a comprehensive mapping and identification of all the expressed proteins of B. melitensis virulent strain 16M, and (ii) a comparative study of its proteome with the attentuated vaccinal strain Rev 1. Comprehensive proteome maps of all six Brucella species will be generated in order to obtain vital information for vaccine development, identification of pathogenicity islands, and establishment of host specificity and evolutionary relatedness.

  4. NOVEL METHODS FOR TARGET PROTEIN IDENTIFICATION USING IMMUNOPRECIPITATION - LC/MS/MS

    EPA Science Inventory

    Proteomics provides a powerful approach to screen and analyze responses to environmental exposures which induce alterations in protein expression, phosphorylation. ubiquitinylation, oxidation. and modulation of general proteome function. Post-translational modifications (PTM) of ...

  5. Comparative proteomics lends insight into genotype-specific pathogenicity.

    PubMed

    Guarnieri, Michael T

    2013-09-01

    Comparative proteomic analyses have emerged as a powerful tool for the identification of unique biomarkers and mechanisms of pathogenesis. In this issue of Proteomics, Murugaiyan et al. utilize difference gel electrophoresis (DIGE) to examine differential protein expression between nonpathogenic and pathogenic genotypes of Prototheca zopfii, a causative agent in bovine enteritis and mastitis. Their findings provide insights into molecular mechanisms of infection and evolutionary adaptation of pathogenic genotypes, demonstrating the power of comparative proteomic analyses. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Controlled vocabularies and ontologies in proteomics: Overview, principles and practice☆

    PubMed Central

    Mayer, Gerhard; Jones, Andrew R.; Binz, Pierre-Alain; Deutsch, Eric W.; Orchard, Sandra; Montecchi-Palazzi, Luisa; Vizcaíno, Juan Antonio; Hermjakob, Henning; Oveillero, David; Julian, Randall; Stephan, Christian; Meyer, Helmut E.; Eisenacher, Martin

    2014-01-01

    This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the area of proteomics, primarily related to the work of the Proteomics Standards Initiative (PSI). It describes the relevant proteomics standard formats and the ontologies used within them. Software and tools for working with these ontology files are also discussed. The article also examines the “mapping files” used to ensure correct controlled vocabulary terms that are placed within PSI standards and the fulfillment of the MIAPE (Minimum Information about a Proteomics Experiment) requirements. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23429179

  7. Development of an open source laboratory information management system for 2-D gel electrophoresis-based proteomics workflow

    PubMed Central

    Morisawa, Hiraku; Hirota, Mikako; Toda, Tosifusa

    2006-01-01

    Background In the post-genome era, most research scientists working in the field of proteomics are confronted with difficulties in management of large volumes of data, which they are required to keep in formats suitable for subsequent data mining. Therefore, a well-developed open source laboratory information management system (LIMS) should be available for their proteomics research studies. Results We developed an open source LIMS appropriately customized for 2-D gel electrophoresis-based proteomics workflow. The main features of its design are compactness, flexibility and connectivity to public databases. It supports the handling of data imported from mass spectrometry software and 2-D gel image analysis software. The LIMS is equipped with the same input interface for 2-D gel information as a clickable map on public 2DPAGE databases. The LIMS allows researchers to follow their own experimental procedures by reviewing the illustrations of 2-D gel maps and well layouts on the digestion plates and MS sample plates. Conclusion Our new open source LIMS is now available as a basic model for proteome informatics, and is accessible for further improvement. We hope that many research scientists working in the field of proteomics will evaluate our LIMS and suggest ways in which it can be improved. PMID:17018156

  8. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.

    PubMed

    Savitski, Mikhail M; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

    2015-09-01

    Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets

    PubMed Central

    Savitski, Mikhail M.; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

    2015-01-01

    Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. PMID:25987413

  10. Mixed-mode ion exchange-based integrated proteomics technology for fast and deep plasma proteome profiling.

    PubMed

    Xue, Lu; Lin, Lin; Zhou, Wenbin; Chen, Wendong; Tang, Jun; Sun, Xiujie; Huang, Peiwu; Tian, Ruijun

    2018-06-09

    Plasma proteome profiling by LC-MS based proteomics has drawn great attention recently for biomarker discovery from blood liquid biopsy. Due to standard multi-step sample preparation could potentially cause plasma protein degradation and analysis variation, integrated proteomics sample preparation technologies became promising solution towards this end. Here, we developed a fully integrated proteomics sample preparation technology for both fast and deep plasma proteome profiling under its native pH. All the sample preparation steps, including protein digestion and two-dimensional fractionation by both mixed-mode ion exchange and high-pH reversed phase mechanism were integrated into one spintip device for the first time. The mixed-mode ion exchange beads design achieved the sample loading at neutral pH and protein digestion within 30 min. Potential sample loss and protein degradation by pH changing could be voided. 1 μL of plasma sample with depletion of high abundant proteins was processed by the developed technology with 12 equally distributed fractions and analyzed with 12 h of LC-MS gradient time, resulting in the identification of 862 proteins. The combination of the Mixed-mode-SISPROT and data-independent MS method achieved fast plasma proteome profiling in 2 h with high identification overlap and quantification precision for a proof-of-concept study of plasma samples from 5 healthy donors. We expect that the Mixed-mode-SISPROT become a generally applicable sample preparation technology for clinical oriented plasma proteome profiling. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002.

    PubMed

    Yang, Yaohua; Feng, Jie; Li, Tao; Ge, Feng; Zhao, Jindong

    2015-01-01

    Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics's usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics. © The Author(s) 2015. Published by Oxford University Press.

  12. Improved Recovery and Identification of Membrane Proteins from Rat Hepatic Cells using a Centrifugal Proteomic Reactor*

    PubMed Central

    Zhou, Hu; Wang, Fangjun; Wang, Yuwei; Ning, Zhibin; Hou, Weimin; Wright, Theodore G.; Sundaram, Meenakshi; Zhong, Shumei; Yao, Zemin; Figeys, Daniel

    2011-01-01

    Despite their importance in many biological processes, membrane proteins are underrepresented in proteomic analysis because of their poor solubility (hydrophobicity) and often low abundance. We describe a novel approach for the identification of plasma membrane proteins and intracellular microsomal proteins that combines membrane fractionation, a centrifugal proteomic reactor for streamlined protein extraction, protein digestion and fractionation by centrifugation, and high performance liquid chromatography-electrospray ionization-tandem MS. The performance of this approach was illustrated for the study of the proteome of ER and Golgi microsomal membranes in rat hepatic cells. The centrifugal proteomic reactor identified 945 plasma membrane proteins and 955 microsomal membrane proteins, of which 63 and 47% were predicted as bona fide membrane proteins, respectively. Among these proteins, >800 proteins were undetectable by the conventional in-gel digestion approach. The majority of the membrane proteins only identified by the centrifugal proteomic reactor were proteins with ≥2 transmembrane segments or proteins with high molecular mass (e.g. >150 kDa) and hydrophobicity. The improved proteomic reactor allowed the detection of a group of endocytic and/or signaling receptor proteins on the plasma membrane, as well as apolipoproteins and glycerolipid synthesis enzymes that play a role in the assembly and secretion of apolipoprotein B100-containing very low density lipoproteins. Thus, the centrifugal proteomic reactor offers a new analytical tool for structure and function studies of membrane proteins involved in lipid and lipoprotein metabolism. PMID:21749988

  13. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

    PubMed Central

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-01-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579

  14. Polyphemus, Odysseus and the ovine milk proteome.

    PubMed

    Cunsolo, Vincenzo; Fasoli, Elisa; Di Francesco, Antonella; Saletti, Rosaria; Muccilli, Vera; Gallina, Serafina; Righetti, Pier Giorgio; Foti, Salvatore

    2017-01-30

    In the last years the amount of ovine milk production, mainly used to formulate a wide range of different and exclusive dairy products often categorized as gourmet food, has been progressively increasing. Taking also into account that sheep milk (SM) also appears to be potentially less allergenic than cow's one, an in-depth information about its protein composition is essential to improve the comprehension of its potential benefits for human consumption. The present work reports the results of an in-depth characterization of SM whey proteome, carried out by coupling the CPLL technology with SDS-PAGE and high resolution UPLC-nESI MS/MS analysis. This approach allowed the identification of 718 different protein components, 644 of which are from unique genes. Particularly, this identification has expanded literature data about sheep whey proteome by 193 novel proteins previously undetected, many of which are involved in the defence/immunity mechanisms or in the nutrient delivery system. A comparative analysis of SM proteome known to date with cow's milk proteome, evidenced that while about 29% of SM proteins are also present in CM, 71% of the identified components appear to be unique of SM proteome and include a heterogeneous group of components which seem to have health-promoting benefits. The data have been deposited to the ProteomeXchange with identifier . Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Selection of Collision Energies in Proteomics Mass Spectrometry Experiments for Best Peptide Identification: Study of Mascot Score Energy Dependence Reveals Double Optimum.

    PubMed

    Révész, Ágnes; Rokob, Tibor András; Jeanne Dit Fouque, Dany; Turiák, Lilla; Memboeuf, Antony; Vékey, Károly; Drahos, László

    2018-05-04

    Collision energy is a key parameter determining the information content of beam-type collision induced dissociation tandem mass spectrometry (MS/MS) spectra, and its optimal choice largely affects successful peptide and protein identification in MS-based proteomics. For an MS/MS spectrum, quality of peptide match based on sequence database search, often characterized in terms of a single score, is a complex function of spectrum characteristics, and its collision energy dependence has remained largely unexplored. We carried out electrospray ionization-quadrupole-time of flight (ESI-Q-TOF)-MS/MS measurements on 2807 peptides from tryptic digests of HeLa and E. coli at 21 different collision energies. Agglomerative clustering of the resulting Mascot score versus energy curves revealed that only few of them display a single, well-defined maximum; rather, they feature either a broad plateau or two clear peaks. Nonlinear least-squares fitting of one or two Gaussian functions allowed the characteristic energies to be determined. We found that the double peaks and the plateaus in Mascot score can be associated with the different energy dependence of b- and y-type fragment ion intensities. We determined that the energies for optimum Mascot scores follow separate linear trends for the unimodal and bimodal cases with rather large residual variance even after differences in proton mobility are taken into account. This leaves room for experiment optimization and points to the possible influence of further factors beyond m/ z.

  16. Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography—Tandem Mass Spectrometry

    PubMed Central

    Tabb, David L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Variyath, Asokan Mulayath; Ham, Amy-Joan L.; Bunk, David M.; Kilpatrick, Lisa E.; Billheimer, Dean D.; Blackman, Ronald K.; Cardasis, Helene L.; Carr, Steven A.; Clauser, Karl R.; Jaffe, Jacob D.; Kowalski, Kevin A.; Neubert, Thomas A.; Regnier, Fred E.; Schilling, Birgit; Tegeler, Tony J.; Wang, Mu; Wang, Pei; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fisher, Susan J.; Gibson, Bradford W.; Kinsinger, Christopher R.; Mesri, Mehdi; Rodriguez, Henry; Stein, Steven E.; Tempst, Paul; Paulovich, Amanda G.; Liebler, Daniel C.; Spiegelman, Cliff

    2009-01-01

    The complexity of proteomic instrumentation for LC-MS/MS introduces many possible sources of variability. Data-dependent sampling of peptides constitutes a stochastic element at the heart of discovery proteomics. Although this variation impacts the identification of peptides, proteomic identifications are far from completely random. In this study, we analyzed interlaboratory data sets from the NCI Clinical Proteomic Technology Assessment for Cancer to examine repeatability and reproducibility in peptide and protein identifications. Included data spanned 144 LC-MS/MS experiments on four Thermo LTQ and four Orbitrap instruments. Samples included yeast lysate, the NCI-20 defined dynamic range protein mix, and the Sigma UPS 1 defined equimolar protein mix. Some of our findings reinforced conventional wisdom, such as repeatability and reproducibility being higher for proteins than for peptides. Most lessons from the data, however, were more subtle. Orbitraps proved capable of higher repeatability and reproducibility, but aberrant performance occasionally erased these gains. Even the simplest protein digestions yielded more peptide ions than LC-MS/MS could identify during a single experiment. We observed that peptide lists from pairs of technical replicates overlapped by 35–60%, giving a range for peptide-level repeatability in these experiments. Sample complexity did not appear to affect peptide identification repeatability, even as numbers of identified spectra changed by an order of magnitude. Statistical analysis of protein spectral counts revealed greater stability across technical replicates for Orbitraps, making them superior to LTQ instruments for biomarker candidate discovery. The most repeatable peptides were those corresponding to conventional tryptic cleavage sites, those that produced intense MS signals, and those that resulted from proteins generating many distinct peptides. Reproducibility among different instruments of the same type lagged behind repeatability of technical replicates on a single instrument by several percent. These findings reinforce the importance of evaluating repeatability as a fundamental characteristic of analytical technologies. PMID:19921851

  17. Recent development of mass spectrometry and proteomics applications in identification and typing of bacteria.

    PubMed

    Cheng, Keding; Chui, Huixia; Domish, Larissa; Hernandez, Drexler; Wang, Gehua

    2016-04-01

    Identification and typing of bacteria occupy a large fraction of time and work in clinical microbiology laboratories. With the certification of some MS platforms in recent years, more applications and tests of MS-based diagnosis methods for bacteria identification and typing have been created, not only on well-accepted MALDI-TOF-MS-based fingerprint matches, but also on solving the insufficiencies of MALDI-TOF-MS-based platforms and advancing the technology to areas such as targeted MS identification and typing of bacteria, bacterial toxin identification, antibiotics susceptibility/resistance tests, and MS-based diagnostic method development on unique bacteria such as Clostridium and Mycobacteria. This review summarizes the recent development in MS platforms and applications in bacteria identification and typing of common pathogenic bacteria. © 2016 The Authors. PROTEOMICS - Clinical Applications Published by WILEY-VCH Verlag GmbH & Co. KGaA.

  18. Highly Efficient Proteolysis Accelerated by Electromagnetic Waves for Peptide Mapping

    PubMed Central

    Chen, Qiwen; Liu, Ting; Chen, Gang

    2011-01-01

    Proteomics will contribute greatly to the understanding of gene functions in the post-genomic era. In proteome research, protein digestion is a key procedure prior to mass spectrometry identification. During the past decade, a variety of electromagnetic waves have been employed to accelerate proteolysis. This review focuses on the recent advances and the key strategies of these novel proteolysis approaches for digesting and identifying proteins. The subjects covered include microwave-accelerated protein digestion, infrared-assisted proteolysis, ultraviolet-enhanced protein digestion, laser-assisted proteolysis, and future prospects. It is expected that these novel proteolysis strategies accelerated by various electromagnetic waves will become powerful tools in proteome research and will find wide applications in high throughput protein digestion and identification. PMID:22379392

  19. iTRAQ Quantitative Proteomic Comparison of Metastatic and Non-Metastatic Uveal Melanoma Tumors

    PubMed Central

    Crabb, John W.; Hu, Bo; Crabb, John S.; Triozzi, Pierre; Saunthararajah, Yogen; Singh, Arun D.

    2015-01-01

    Background Uveal melanoma is the most common malignancy of the adult eye. The overall mortality rate is high because this aggressive cancer often metastasizes before ophthalmic diagnosis. Quantitative proteomic analysis of primary metastasizing and non-metastasizing tumors was pursued for insights into mechanisms and biomarkers of uveal melanoma metastasis. Methods Eight metastatic and 7 non-metastatic human primary uveal melanoma tumors were analyzed by LC MS/MS iTRAQ technology with Bruch’s membrane/choroid complex from normal postmortem eyes as control tissue. Tryptic peptides from tumor and control proteins were labeled with iTRAQ tags, fractionated by cation exchange chromatography, and analyzed by LC MS/MS. Protein identification utilized the Mascot search engine and the human Uni-Prot/Swiss-Protein database with false discovery ≤ 1%; protein quantitation utilized the Mascot weighted average method. Proteins designated differentially expressed exhibited quantitative differences (p ≤ 0.05, t-test) in a training set of five metastatic and five non-metastatic tumors. Logistic regression models developed from the training set were used to classify the metastatic status of five independent tumors. Results Of 1644 proteins identified and quantified in 5 metastatic and 5 non-metastatic tumors, 12 proteins were found uniquely in ≥ 3 metastatic tumors, 28 were found significantly elevated and 30 significantly decreased only in metastatic tumors, and 31 were designated differentially expressed between metastatic and non-metastatic tumors. Logistic regression modeling of differentially expressed collagen alpha-3(VI) and heat shock protein beta-1 allowed correct prediction of metastasis status for each of five independent tumor specimens. Conclusions The present data provide new clues to molecular differences in metastatic and non-metastatic uveal melanoma tumors. While sample size is limited and validation required, the results support collagen alpha-3(VI) and heat shock protein beta-1 as candidate biomarkers of uveal melanoma metastasis and establish a quantitative proteomic database for uveal melanoma primary tumors. PMID:26305875

  20. Proteomics in the investigation of HIV-1 interactions with host proteins.

    PubMed

    Li, Ming

    2015-02-01

    Productive HIV-1 infection depends on host machinery, including a broad array of cellular proteins. Proteomics has played a significant role in the discovery of HIV-1 host proteins. In this review, after a brief survey of the HIV-1 host proteins that were discovered by proteomic analyses, I focus on analyzing the interactions between the virion and host proteins, as well as the technologies and strategies used in those proteomic studies. With the help of proteomics, the identification and characterization of HIV-1 host proteins can be translated into novel antiretroviral therapeutics. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Thermo-msf-parser: an open source Java library to parse and visualize Thermo Proteome Discoverer msf files.

    PubMed

    Colaert, Niklaas; Barsnes, Harald; Vaudel, Marc; Helsens, Kenny; Timmerman, Evy; Sickmann, Albert; Gevaert, Kris; Martens, Lennart

    2011-08-05

    The Thermo Proteome Discoverer program integrates both peptide identification and quantification into a single workflow for peptide-centric proteomics. Furthermore, its close integration with Thermo mass spectrometers has made it increasingly popular in the field. Here, we present a Java library to parse the msf files that constitute the output of Proteome Discoverer. The parser is also implemented as a graphical user interface allowing convenient access to the information found in the msf files, and in Rover, a program to analyze and validate quantitative proteomics information. All code, binaries, and documentation is freely available at http://thermo-msf-parser.googlecode.com.

  2. Quantitative Proteomic Analysis of Optimal Cutting Temperature (OCT) Embedded Core-Needle Biopsy of Lung Cancer

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaozheng; Huffman, Kenneth E.; Fujimoto, Junya; Canales, Jamie Rodriguez; Girard, Luc; Nie, Guangjun; Heymach, John V.; Wistuba, Igacio I.; Minna, John D.; Yu, Yonghao

    2017-10-01

    With recent advances in understanding the genomic underpinnings and oncogenic drivers of pathogenesis in different subtypes, it is increasingly clear that proper pretreatment diagnostics are essential for the choice of appropriate treatment options for non-small cell lung cancer (NSCLC). Tumor tissue preservation in optimal cutting temperature (OCT) compound is commonly used in the surgical suite. However, proteins recovered from OCT-embedded specimens pose a challenge for LC-MS/MS experiments, due to the large amounts of polymers present in OCT. Here we present a simple workflow for whole proteome analysis of OCT-embedded NSCLC tissue samples, which involves a simple trichloroacetic acid precipitation step. Comparisons of protein recovery between frozen versus OCT-embedded tissue showed excellent consistency with more than 9200 proteins identified. Using an isobaric labeling strategy, we quantified more than 5400 proteins in tumor versus normal OCT-embedded core needle biopsy samples. Gene ontology analysis indicated that a number of proliferative as well as squamous cell carcinoma (SqCC) marker proteins were overexpressed in the tumor, consistent with the patient's pathology based diagnosis of "poorly differentiated SqCC". Among the most downregulated proteins in the tumor sample, we noted a number of proteins with potential immunomodulatory functions. Finally, interrogation of the aberrantly expressed proteins using a candidate approach and cross-referencing with publicly available databases led to the identification of potential druggable targets in DNA replication and DNA damage repair pathways. We conclude that our approach allows LC-MS/MS proteomic analyses on OCT-embedded lung cancer specimens, opening the way to bring powerful proteomics into the clinic. [Figure not available: see fulltext.

  3. Anopheles salivary gland proteomes from major malaria vectors

    PubMed Central

    2012-01-01

    Background Antibody responses against Anopheles salivary proteins can indicate individual exposure to bites of malaria vectors. The extent to which these salivary proteins are species-specific is not entirely resolved. Thus, a better knowledge of the diversity among salivary protein repertoires from various malaria vector species is necessary to select relevant genus-, subgenus- and/or species-specific salivary antigens. Such antigens could be used for quantitative (mosquito density) and qualitative (mosquito species) immunological evaluation of malaria vectors/host contact. In this study, salivary gland protein repertoires (sialomes) from several Anopheles species were compared using in silico analysis and proteomics. The antigenic diversity of salivary gland proteins among different Anopheles species was also examined. Results In silico analysis of secreted salivary gland protein sequences retrieved from an NCBInr database of six Anopheles species belonging to the Cellia subgenus (An. gambiae, An. arabiensis, An. stephensi and An. funestus) and Nyssorhynchus subgenus (An. albimanus and An. darlingi) displayed a higher degree of similarity compared to salivary proteins from closely related Anopheles species. Additionally, computational hierarchical clustering allowed identification of genus-, subgenus- and species-specific salivary proteins. Proteomic and immunoblot analyses performed on salivary gland extracts from four Anopheles species (An. gambiae, An. arabiensis, An. stephensi and An. albimanus) indicated that heterogeneity of the salivary proteome and antigenic proteins was lower among closely related anopheline species and increased with phylogenetic distance. Conclusion This is the first report on the diversity of the salivary protein repertoire among species from the Anopheles genus at the protein level. This work demonstrates that a molecular diversity is exhibited among salivary proteins from closely related species despite their common pharmacological activities. The involvement of these proteins as antigenic candidates for genus-, subgenus- or species-specific immunological evaluation of individual exposure to Anopheles bites is discussed. PMID:23148599

  4. Integration of Transcriptome, Proteome and Metabolism Data Reveals the Alkaloids Biosynthesis in Macleaya cordata and Macleaya microcarpa

    PubMed Central

    Liu, Fuqing; Huang, Peng; Zhu, Pengcheng; Chen, Jinjun; Shi, Mingming; Guo, Fang; Cheng, Pi; Zeng, Jing; Liao, Yifang; Gong, Jing; Zhang, Hong-Mei; Wang, Depeng; Guo, An-Yuan; Xiong, Xingyao

    2013-01-01

    Background The Macleaya spp., including Macleaya cordata and Macleaya microcarpa, are traditional anti-virus, inflammation eliminating, and insecticide herb medicines for their isoquinoline alkaloids. They are also known as the basis of the popular natural animal food addictive in Europe. However, few studies especially at genomics level were conducted on them. Hence, we performed the Macleaya spp. transcriptome and integrated it with iTRAQ proteome analysis in order to identify potential genes involved in alkaloids biosynthesis. Methodology and Principal Findings We elaborately designed the transcriptome, proteome and metabolism profiling for 10 samples of both species to explore their alkaloids biosynthesis. From the transcriptome data, we obtained 69367 and 78255 unigenes for M. cordata and M. microcarpa, in which about two thirds of them were similar to sequences in public databases. By metabolism profiling, reverse patterns for alkaloids sanguinarine, chelerythrine, protopine, and allocryptopine were observed in different organs of two species. We characterized the expressions of enzymes in alkaloid biosynthesis pathways. We also identified more than 1000 proteins from iTRAQ proteome data. Our results strongly suggest that the root maybe the organ for major alkaloids biosynthesis of Macleaya spp. Except for biosynthesis, the alkaloids storage and transport were also important for their accumulation. The ultrastructure of laticifers by SEM helps us to prove the alkaloids maybe accumulated in the mature roots. Conclusions/Significance To our knowledge this is the first study to elucidate the genetic makeup of Macleaya spp. This work provides clues to the identification of the potential modulate genes involved in alkaloids biosynthesis in Macleaya spp., and sheds light on researches for non-model medicinal plants by integrating different high-throughput technologies. PMID:23326424

  5. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  6. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

    PubMed

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  7. Identification of membrane proteome of Paracoccidioides lutzii and its regulation by zinc

    PubMed Central

    de Curcio, Juliana Santana; Silva, Marielle Garcia; Silva Bailão, Mirelle Garcia; Báo, Sônia Nair; Casaletti, Luciana; Bailão, Alexandre Mello; de Almeida Soares, Célia Maria

    2017-01-01

    Aim: During infection development in the host, Paracoccidioides spp. faces the deprivation of micronutrients, a mechanism called nutritional immunity. This condition induces the remodeling of proteins present in different metabolic pathways. Therefore, we attempted to identify membrane proteins and their regulation by zinc in Paracoccidioides lutzii. Materials & methods: Membranes enriched fraction of yeast cells of P. lutzii were isolated, purified and identified by 2D LC–MS/MS detection and database search. Results & conclusion: Zinc deprivation suppressed the expression of membrane proteins such as glycoproteins, those involved in cell wall synthesis and those related to oxidative phosphorylation. This is the first study describing membrane proteins and the effect of zinc deficiency in their regulation in one member of the genus Paracoccidioides. PMID:29134119

  8. The UniProtKB guide to the human proteome

    PubMed Central

    Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole

    2016-01-01

    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845

  9. Identification of a putative protein profile associating with tamoxifen therapy resistance in breast cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Umar, Arzu; Kang, Hyuk; Timmermans, A. M.

    2009-06-01

    Tamoxifen-resistance is a major cause of death in patients with recurrent breast cancer. Current clinical factors can correctly predict therapy response in only half of the treated patients. Identification of proteins that associate with tamoxifen-resistance is a first step towards better response prediction and tailored treatment of patients. In the present study we intended to identify putative protein biomarkers indicative of tamoxifen therapy-resistance in breast cancer, using nanoLC coupled with FTICR MS. Comparative proteome analysis was performed on ~5,500 pooled tumor cells (corresponding to ~550 ng protein lysate/analysis) obtained through laser capture microdissection (LCM) from two independently processed data setsmore » (n=24 and n=27) containing both tamoxifen therapy-sensitive and therapy-resistant tumors. Peptides and proteins were identified by matching mass and elution time of newly acquired LC-MS features to information in previously generated accurate mass and time tag (AMT) reference databases.« less

  10. "Plasmo2D": an ancillary proteomic tool to aid identification of proteins from Plasmodium falciparum.

    PubMed

    Khachane, Amit; Kumar, Ranjit; Jain, Sanyam; Jain, Samta; Banumathy, Gowrishankar; Singh, Varsha; Nagpal, Saurabh; Tatu, Utpal

    2005-01-01

    Bioinformatics tools to aid gene and protein sequence analysis have become an integral part of biology in the post-genomic era. Release of the Plasmodium falciparum genome sequence has allowed biologists to define the gene and the predicted protein content as well as their sequences in the parasite. Using pI and molecular weight as characteristics unique to each protein, we have developed a bioinformatics tool to aid identification of proteins from Plasmodium falciparum. The tool makes use of a Virtual 2-DE generated by plotting all of the proteins from the Plasmodium database on a pI versus molecular weight scale. Proteins are identified by comparing the position of migration of desired protein spots from an experimental 2-DE and that on a virtual 2-DE. The procedure has been automated in the form of user-friendly software called "Plasmo2D". The tool can be downloaded from http://144.16.89.25/Plasmo2D.zip.

  11. A Proteomic Approach to Investigating Gene Cluster Expression and Secondary Metabolite Functionality in Aspergillus fumigatus

    PubMed Central

    Owens, Rebecca A.; Hammel, Stephen; Sheridan, Kevin J.; Jones, Gary W.; Doyle, Sean

    2014-01-01

    A combined proteomics and metabolomics approach was utilised to advance the identification and characterisation of secondary metabolites in Aspergillus fumigatus. Here, implementation of a shotgun proteomic strategy led to the identification of non-redundant mycelial proteins (n = 414) from A. fumigatus including proteins typically under-represented in 2-D proteome maps: proteins with multiple transmembrane regions, hydrophobic proteins and proteins with extremes of molecular mass and pI. Indirect identification of secondary metabolite cluster expression was also achieved, with proteins (n = 18) from LaeA-regulated clusters detected, including GliT encoded within the gliotoxin biosynthetic cluster. Biochemical analysis then revealed that gliotoxin significantly attenuates H2O2-induced oxidative stress in A. fumigatus (p>0.0001), confirming observations from proteomics data. A complementary 2-D/LC-MS/MS approach further elucidated significantly increased abundance (p<0.05) of proliferating cell nuclear antigen (PCNA), NADH-quinone oxidoreductase and the gliotoxin oxidoreductase GliT, along with significantly attenuated abundance (p<0.05) of a heat shock protein, an oxidative stress protein and an autolysis-associated chitinase, when gliotoxin and H2O2 were present, compared to H2O2 alone. Moreover, gliotoxin exposure significantly reduced the abundance of selected proteins (p<0.05) involved in de novo purine biosynthesis. Significantly elevated abundance (p<0.05) of a key enzyme, xanthine-guanine phosphoribosyl transferase Xpt1, utilised in purine salvage, was observed in the presence of H2O2 and gliotoxin. This work provides new insights into the A. fumigatus proteome and experimental strategies, plus mechanistic data pertaining to gliotoxin functionality in the organism. PMID:25198175

  12. Top-down proteomic identification of Shiga toxin 2 subtypes from Shiga toxin-producing Escherichia coli by matrix-assisted laser desorption ionization-tandem time of flight mass spectrometry.

    PubMed

    Fagerquist, Clifton K; Zaragoza, William J; Sultan, Omar; Woo, Nathan; Quiñones, Beatriz; Cooley, Michael B; Mandrell, Robert E

    2014-05-01

    We have analyzed 26 Shiga toxin-producing Escherichia coli (STEC) strains for Shiga toxin 2 (Stx2) production using matrix-assisted laser desorption ionization (MALDI)-tandem time of flight (TOF-TOF) tandem mass spectrometry (MS/MS) and top-down proteomic analysis. STEC strains were induced to overexpress Stx2 by overnight culturing on solid agar supplemented with either ciprofloxacin or mitomycin C. Harvested cells were lysed by bead beating, and unfractionated bacterial cell lysates were ionized by MALDI. The A2 fragment of the A subunit and the mature B subunit of Stx2 were analyzed by MS/MS. Sequence-specific fragment ions were used to identify amino acid subtypes of Stx2 using top-down proteomic analysis using software developed in-house at the U.S. Department of Agriculture (USDA). Stx2 subtypes (a, c, d, f, and g) were identified on the basis of the mass of the A2 fragment and the B subunit as well as from their sequence-specific fragment ions by MS/MS (postsource decay). Top-down proteomic identification was in agreement with DNA sequencing of the full Stx2 operon (stx2) for all strains. Top-down results were also compared to a bioassay using a Vero-d2EGFP cell line. Our results suggest that top-down proteomic identification is a rapid, highly specific technique for distinguishing Stx2 subtypes.

  13. Comparative Evaluation of Small Molecular Additives and Their Effects on Peptide/Protein Identification.

    PubMed

    Gao, Jing; Zhong, Shaoyun; Zhou, Yanting; He, Han; Peng, Shuying; Zhu, Zhenyun; Liu, Xing; Zheng, Jing; Xu, Bin; Zhou, Hu

    2017-06-06

    Detergents and salts are widely used in lysis buffers to enhance protein extraction from biological samples, facilitating in-depth proteomic analysis. However, these detergents and salt additives must be efficiently removed from the digested samples prior to LC-MS/MS analysis to obtain high-quality mass spectra. Although filter-aided sample preparation (FASP), acetone precipitation (AP), followed by in-solution digestion, and strong cation exchange-based centrifugal proteomic reactors (CPRs) are commonly used for proteomic sample processing, little is known about their efficiencies at removing detergents and salt additives. In this study, we (i) developed an integrative workflow for the quantification of small molecular additives in proteomic samples, developing a multiple reaction monitoring (MRM)-based LC-MS approach for the quantification of six additives (i.e., Tris, urea, CHAPS, SDS, SDC, and Triton X-100) and (ii) systematically evaluated the relationships between the level of additive remaining in samples following sample processing and the number of peptides/proteins identified by mass spectrometry. Although FASP outperformed the other two methods, the results were complementary in terms of peptide/protein identification, as well as the GRAVY index and amino acid distributions. This is the first systematic and quantitative study of the effect of detergents and salt additives on protein identification. This MRM-based approach can be used for an unbiased evaluation of the performance of new sample preparation methods. Data are available via ProteomeXchange under identifier PXD005405.

  14. Top-Down Proteomic Identification of Shiga Toxin 2 Subtypes from Shiga Toxin-Producing Escherichia coli by Matrix-Assisted Laser Desorption Ionization–Tandem Time of Flight Mass Spectrometry

    PubMed Central

    Zaragoza, William J.; Sultan, Omar; Woo, Nathan; Quiñones, Beatriz; Cooley, Michael B.; Mandrell, Robert E.

    2014-01-01

    We have analyzed 26 Shiga toxin-producing Escherichia coli (STEC) strains for Shiga toxin 2 (Stx2) production using matrix-assisted laser desorption ionization (MALDI)–tandem time of flight (TOF-TOF) tandem mass spectrometry (MS/MS) and top-down proteomic analysis. STEC strains were induced to overexpress Stx2 by overnight culturing on solid agar supplemented with either ciprofloxacin or mitomycin C. Harvested cells were lysed by bead beating, and unfractionated bacterial cell lysates were ionized by MALDI. The A2 fragment of the A subunit and the mature B subunit of Stx2 were analyzed by MS/MS. Sequence-specific fragment ions were used to identify amino acid subtypes of Stx2 using top-down proteomic analysis using software developed in-house at the U.S. Department of Agriculture (USDA). Stx2 subtypes (a, c, d, f, and g) were identified on the basis of the mass of the A2 fragment and the B subunit as well as from their sequence-specific fragment ions by MS/MS (postsource decay). Top-down proteomic identification was in agreement with DNA sequencing of the full Stx2 operon (stx2) for all strains. Top-down results were also compared to a bioassay using a Vero-d2EGFP cell line. Our results suggest that top-down proteomic identification is a rapid, highly specific technique for distinguishing Stx2 subtypes. PMID:24584253

  15. Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data*

    PubMed Central

    Boyanova, Desislava; Nilla, Santosh; Klau, Gunnar W.; Dandekar, Thomas; Müller, Tobias; Dittrich, Marcus

    2014-01-01

    The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system. PMID:24807868

  16. Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows

    NASA Astrophysics Data System (ADS)

    Ivanov, Mark V.; Levitsky, Lev I.; Gorshkov, Mikhail V.

    2016-09-01

    A number of proteomic database search engines implement multi-stage strategies aiming at increasing the sensitivity of proteome analysis. These approaches often employ a subset of the original database for the secondary stage of analysis. However, if target-decoy approach (TDA) is used for false discovery rate (FDR) estimation, the multi-stage strategies may violate the underlying assumption of TDA that false matches are distributed uniformly across the target and decoy databases. This violation occurs if the numbers of target and decoy proteins selected for the second search are not equal. Here, we propose a method of decoy database generation based on the previously reported decoy fusion strategy. This method allows unbiased TDA-based FDR estimation in multi-stage searches and can be easily integrated into existing workflows utilizing popular search engines and post-search algorithms.

  17. Protein identification and quantification from riverbank grape, Vitis riparia: Comparing SDS-PAGE and FASP-GPF techniques for shotgun proteomic analysis.

    PubMed

    George, Iniga S; Fennell, Anne Y; Haynes, Paul A

    2015-09-01

    Protein sample preparation optimisation is critical for establishing reproducible high throughput proteomic analysis. In this study, two different fractionation sample preparation techniques (in-gel digestion and in-solution digestion) for shotgun proteomics were used to quantitatively compare proteins identified in Vitis riparia leaf samples. The total number of proteins and peptides identified were compared between filter aided sample preparation (FASP) coupled with gas phase fractionation (GPF) and SDS-PAGE methods. There was a 24% increase in the total number of reproducibly identified proteins when FASP-GPF was used. FASP-GPF is more reproducible, less expensive and a better method than SDS-PAGE for shotgun proteomics of grapevine samples as it significantly increases protein identification across biological replicates. Total peptide and protein information from the two fractionation techniques is available in PRIDE with the identifier PXD001399 (http://proteomecentral.proteomexchange.org/dataset/PXD001399). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. A multi-center study benchmarks software tools for label-free proteome quantification

    PubMed Central

    Gillet, Ludovic C; Bernhardt, Oliver M.; MacLean, Brendan; Röst, Hannes L.; Tate, Stephen A.; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I.; Aebersold, Ruedi; Tenzer, Stefan

    2016-01-01

    The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation windows setups. For consistent evaluation we developed LFQbench, an R-package to calculate metrics of precision and accuracy in label-free quantitative MS, and report the identification performance, robustness and specificity of each software tool. Our reference datasets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics. PMID:27701404

  19. A multicenter study benchmarks software tools for label-free proteome quantification.

    PubMed

    Navarro, Pedro; Kuharev, Jörg; Gillet, Ludovic C; Bernhardt, Oliver M; MacLean, Brendan; Röst, Hannes L; Tate, Stephen A; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I; Aebersold, Ruedi; Tenzer, Stefan

    2016-11-01

    Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.

  20. SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

    PubMed

    Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

    2015-07-29

    Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th . SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.

  1. RaftProt: mammalian lipid raft proteome database.

    PubMed

    Shah, Anup; Chen, David; Boda, Akash R; Foster, Leonard J; Davis, Melissa J; Hill, Michelle M

    2015-01-01

    RaftProt (http://lipid-raft-database.di.uq.edu.au/) is a database of mammalian lipid raft-associated proteins as reported in high-throughput mass spectrometry studies. Lipid rafts are specialized membrane microdomains enriched in cholesterol and sphingolipids thought to act as dynamic signalling and sorting platforms. Given their fundamental roles in cellular regulation, there is a plethora of information on the size, composition and regulation of these membrane microdomains, including a large number of proteomics studies. To facilitate the mining and analysis of published lipid raft proteomics studies, we have developed a searchable database RaftProt. In addition to browsing the studies, performing basic queries by protein and gene names, searching experiments by cell, tissue and organisms; we have implemented several advanced features to facilitate data mining. To address the issue of potential bias due to biochemical preparation procedures used, we have captured the lipid raft preparation methods and implemented advanced search option for methodology and sample treatment conditions, such as cholesterol depletion. Furthermore, we have identified a list of high confidence proteins, and enabled searching only from this list of likely bona fide lipid raft proteins. Given the apparent biological importance of lipid raft and their associated proteins, this database would constitute a key resource for the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Tear proteomic analysis of patients with type 2 diabetes and dry eye syndrome by two-dimensional nano-liquid chromatography coupled with tandem mass spectrometry.

    PubMed

    Li, Bing; Sheng, Minjie; Xie, Liqi; Liu, Feng; Yan, Guoquan; Wang, Weifang; Lin, Anjuan; Zhao, Fei; Chen, Yihui

    2014-01-09

    Diabetes mellitus has been shown to be associated with and complicated by dry eye syndrome. We sought to examine and compare the tear film proteome of type 2 diabetic patients with or without dry eye syndrome and normal subjects using two-dimensional nano-liquid chromatography coupled with tandem mass spectrometry (MS)-based proteomics. Tears were collected from eight type 2 diabetes patients with dry eye syndrome, eight type 2 diabetes patients without dry eye syndrome, and eight normal subjects. Tear breakup time (BUT) was determined, and tear proteins were prepared and analyzed using two-dimensional strong cation-exchange/reversed-phase nano-scale liquid chromatography MS. All MS/MS spectra were identified by using SEQUEST against the human International Protein Index (IPI) database and the relative abundance of individual proteins was assessed by spectral counting. Tear BUT was significantly lower in patients with diabetes and dry eye syndrome than in patients with diabetes only and normal subjects. Analysis of spectral counts of tear proteins showed that, compared to healthy controls, patients with diabetes and dry eye syndrome had increased expression of apoptosis-related proteins, like annexin A1, and immunity- and inflammation-related proteins, including neutrophil elastase 2 and clusterin, and glycometabolism-related proteins, like apolipoprotein A-II. Dry eye syndrome in diabetic patients is associated with aberrant expression of tear proteins, and the findings could lead to identification of novel pathways for therapeutic targeting and new diagnostic markers.

  3. Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics.

    PubMed

    Kelstrup, Christian D; Bekker-Jensen, Dorte B; Arrey, Tabiwang N; Hogrebe, Alexander; Harder, Alexander; Olsen, Jesper V

    2018-01-05

    Progress in proteomics is mainly driven by advances in mass spectrometric (MS) technologies. Here we benchmarked the performance of the latest MS instrument in the benchtop Orbitrap series, the Q Exactive HF-X, against its predecessor for proteomics applications. A new peak-picking algorithm, a brighter ion source, and optimized ion transfers enable productive MS/MS acquisition above 40 Hz at 7500 resolution. The hardware and software improvements collectively resulted in improved peptide and protein identifications across all comparable conditions, with an increase of up to 50 percent at short LC-MS gradients, yielding identification rates of more than 1000 unique peptides per minute. Alternatively, the Q Exactive HF-X is capable of achieving the same proteome coverage as its predecessor in approximately half the gradient time or at 10-fold lower sample loads. The Q Exactive HF-X also enables rapid phosphoproteomics with routine analysis of more than 5000 phosphopeptides with short single-shot 15 min LC-MS/MS measurements, or 16 700 phosphopeptides quantified across ten conditions in six gradient hours using TMT10-plex and offline peptide fractionation. Finally, exciting perspectives for data-independent acquisition are highlighted with reproducible identification of 55 000 unique peptides covering 5900 proteins in half an hour of MS analysis.

  4. Toward an Upgraded Honey Bee (Apis mellifera L.) Genome Annotation Using Proteogenomics.

    PubMed

    McAfee, Alison; Harpur, Brock A; Michaud, Sarah; Beavis, Ronald C; Kent, Clement F; Zayed, Amro; Foster, Leonard J

    2016-02-05

    The honey bee is a key pollinator in agricultural operations as well as a model organism for studying the genetics and evolution of social behavior. The Apis mellifera genome has been sequenced and annotated twice over, enabling proteomics and functional genomics methods for probing relevant aspects of their biology. One troubling trend that emerged from proteomic analyses is that honey bee peptide samples consistently result in lower peptide identification rates compared with other organisms. This suggests that the genome annotation can be improved, or atypical biological processes are interfering with the mass spectrometry workflow. First, we tested whether high levels of polymorphisms could explain some of the missed identifications by searching spectra against the reference proteome (OGSv3.2) versus a customized proteome of a single honey bee, but our results indicate that this contribution was minor. Likewise, error-tolerant peptide searches lead us to eliminate unexpected post-translational modifications as a major factor in missed identifications. We then used a proteogenomic approach with ~1500 raw files to search for missing genes and new exons, to revive discarded annotations and to identify over 2000 new coding regions. These results will contribute to a more comprehensive genome annotation and facilitate continued research on this important insect.

  5. Absolute Quantification of Middle- to High-Abundant Plasma Proteins via Targeted Proteomics.

    PubMed

    Dittrich, Julia; Ceglarek, Uta

    2017-01-01

    The increasing number of peptide and protein biomarker candidates requires expeditious and reliable quantification strategies. The utilization of liquid chromatography coupled to quadrupole tandem mass spectrometry (LC-MS/MS) for the absolute quantitation of plasma proteins and peptides facilitates the multiplexed verification of tens to hundreds of biomarkers from smallest sample quantities. Targeted proteomics assays derived from bottom-up proteomics principles rely on the identification and analysis of proteotypic peptides formed in an enzymatic digestion of the target protein. This protocol proposes a procedure for the establishment of a targeted absolute quantitation method for middle- to high-abundant plasma proteins waiving depletion or enrichment steps. Essential topics as proteotypic peptide identification and LC-MS/MS method development as well as sample preparation and calibration strategies are described in detail.

  6. Quantitative trait loci mapping of the mouse plasma proteome (pQTL).

    PubMed

    Holdt, Lesca M; von Delft, Annette; Nicolaou, Alexandros; Baumann, Sven; Kostrzewa, Markus; Thiery, Joachim; Teupser, Daniel

    2013-02-01

    A current challenge in the era of genome-wide studies is to determine the responsible genes and mechanisms underlying newly identified loci. Screening of the plasma proteome by high-throughput mass spectrometry (MALDI-TOF MS) is considered a promising approach for identification of metabolic and disease processes. Therefore, plasma proteome screening might be particularly useful for identifying responsible genes when combined with analysis of variation in the genome. Here, we describe a proteomic quantitative trait locus (pQTL) study of plasma proteome screens in an F(2) intercross of 455 mice mapped with 177 genetic markers across the genome. A total of 69 of 176 peptides revealed significant LOD scores (≥5.35) demonstrating strong genetic regulation of distinct components of the plasma proteome. Analyses were confirmed by mechanistic studies and MALDI-TOF/TOF, liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of the two strongest pQTLs: A pQTL for mass-to-charge ratio (m/z) 3494 (LOD 24.9, D11Mit151) was identified as the N-terminal 35 amino acids of hemoglobin subunit A (Hba) and caused by genetic variation in Hba. Another pQTL for m/z 8713 (LOD 36.4; D1Mit111) was caused by variation in apolipoprotein A2 (Apoa2) and cosegregated with HDL cholesterol. Taken together, we show that genome-wide plasma proteome profiling in combination with genome-wide genetic screening aids in the identification of causal genetic variants affecting abundance of plasma proteins.

  7. Quantitative Trait Loci Mapping of the Mouse Plasma Proteome (pQTL)

    PubMed Central

    Holdt, Lesca M.; von Delft, Annette; Nicolaou, Alexandros; Baumann, Sven; Kostrzewa, Markus; Thiery, Joachim; Teupser, Daniel

    2013-01-01

    A current challenge in the era of genome-wide studies is to determine the responsible genes and mechanisms underlying newly identified loci. Screening of the plasma proteome by high-throughput mass spectrometry (MALDI-TOF MS) is considered a promising approach for identification of metabolic and disease processes. Therefore, plasma proteome screening might be particularly useful for identifying responsible genes when combined with analysis of variation in the genome. Here, we describe a proteomic quantitative trait locus (pQTL) study of plasma proteome screens in an F2 intercross of 455 mice mapped with 177 genetic markers across the genome. A total of 69 of 176 peptides revealed significant LOD scores (≥5.35) demonstrating strong genetic regulation of distinct components of the plasma proteome. Analyses were confirmed by mechanistic studies and MALDI-TOF/TOF, liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of the two strongest pQTLs: A pQTL for mass-to-charge ratio (m/z) 3494 (LOD 24.9, D11Mit151) was identified as the N-terminal 35 amino acids of hemoglobin subunit A (Hba) and caused by genetic variation in Hba. Another pQTL for m/z 8713 (LOD 36.4; D1Mit111) was caused by variation in apolipoprotein A2 (Apoa2) and cosegregated with HDL cholesterol. Taken together, we show that genome-wide plasma proteome profiling in combination with genome-wide genetic screening aids in the identification of causal genetic variants affecting abundance of plasma proteins. PMID:23172855

  8. PatternLab for proteomics 4.0: A one-stop shop for analyzing shotgun proteomic data

    PubMed Central

    Carvalho, Paulo C; Lima, Diogo B; Leprevost, Felipe V; Santos, Marlon D M; Fischer, Juliana S G; Aquino, Priscila F; Moresco, James J; Yates, John R; Barbosa, Valmir C

    2017-01-01

    PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for analyzing shotgun proteomic data. PatternLab contains modules for formatting sequence databases, performing peptide spectrum matching, statistically filtering and organizing shotgun proteomic data, extracting quantitative information from label-free and chemically labeled data, performing statistics for differential proteomics, displaying results in a variety of graphical formats, performing similarity-driven studies with de novo sequencing data, analyzing time-course experiments, and helping with the understanding of the biological significance of data in the light of the Gene Ontology. Here we describe PatternLab for proteomics 4.0, which closely knits together all of these modules in a self-contained environment, covering the principal aspects of proteomic data analysis as a freely available and easily installable software package. All updates to PatternLab, as well as all new features added to it, have been tested over the years on millions of mass spectra. PMID:26658470

  9. Panorama: A Targeted Proteomics Knowledge Base

    PubMed Central

    2015-01-01

    Panorama is a web application for storing, sharing, analyzing, and reusing targeted assays created and refined with Skyline,1 an increasingly popular Windows client software tool for targeted proteomics experiments. Panorama allows laboratories to store and organize curated results contained in Skyline documents with fine-grained permissions, which facilitates distributed collaboration and secure sharing of published and unpublished data via a web-browser interface. It is fully integrated with the Skyline workflow and supports publishing a document directly to a Panorama server from the Skyline user interface. Panorama captures the complete Skyline document information content in a relational database schema. Curated results published to Panorama can be aggregated and exported as chromatogram libraries. These libraries can be used in Skyline to pick optimal targets in new experiments and to validate peak identification of target peptides. Panorama is open-source and freely available. It is distributed as part of LabKey Server,2 an open source biomedical research data management system. Laboratories and organizations can set up Panorama locally by downloading and installing the software on their own servers. They can also request freely hosted projects on https://panoramaweb.org, a Panorama server maintained by the Department of Genome Sciences at the University of Washington. PMID:25102069

  10. Multi-Approach Analysis for the Identification of Proteases within Birch Pollen.

    PubMed

    McKenna, Olivia E; Posselt, Gernot; Briza, Peter; Lackner, Peter; Schmitt, Armin O; Gadermaier, Gabriele; Wessler, Silja; Ferreira, Fatima

    2017-07-04

    Birch pollen allergy is highly prevalent, with up to 100 million reported cases worldwide. Proteases in such allergen sources have been suggested to contribute to primary sensitisation and exacerbation of allergic disorders. Until now the protease content of Betula verrucosa , a birch species endemic to the northern hemisphere has not been studied in detail. Hence, we aim to identify and characterise pollen and bacteria-derived proteases found within birch pollen. The pollen transcriptome was constructed via de novo transcriptome sequencing and analysis of the proteome was achieved via mass spectrometry; a cross-comparison of the two databases was then performed. A total of 42 individual proteases were identified at the proteomic level. Further clustering of proteases into their distinct catalytic classes revealed serine, cysteine, aspartic, threonine, and metallo-proteases. Further to this, protease activity of the pollen was quantified using a fluorescently-labelled casein substrate protease assay, as 0.61 ng/mg of pollen. A large number of bacterial strains were isolated from freshly collected birch pollen and zymographic gels with gelatinase and casein, enabled visualisation of proteolytic activity of the pollen and the collected bacterial strains. We report the successful discovery of pollen and bacteria-derived proteases of Betula verrucosa .

  11. A collection of open source applications for mass spectrometry data mining.

    PubMed

    Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

    2014-10-01

    We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Mass spectrometric identification of proteins in complex post-genomic projects. Soluble proteins of the metabolically versatile, denitrifying 'Aromatoleum' sp. strain EbN1.

    PubMed

    Hufnagel, Peter; Rabus, Ralf

    2006-01-01

    The rapidly developing proteomics technologies help to advance the global understanding of physiological and cellular processes. The lifestyle of a study organism determines the type and complexity of a given proteomic project. The complexity of this study is characterized by a broad collection of pathway-specific subproteomes, reflecting the metabolic versatility as well as the regulatory potential of the aromatic-degrading, denitrifying bacterium 'Aromatoleum' sp. strain EbN1. Differences in protein profiles were determined using a gel-based approach. Protein identification was based on a progressive application of MALDI-TOF-MS, MALDI-TOF-MS/MS and LC-ESI-MS/MS. This progression was result-driven and automated by software control. The identification rate was increased by the assembly of a project-specific list of background signals that was used for internal calibration of the MS spectra, and by the combination of two search engines using a dedicated MetaScoring algorithm. In total, intelligent bioinformatics could increase the identification yield from 53 to 70% of the analyzed 5,050 gel spots; a total of 556 different proteins were identified. MS identification was highly reproducible: most proteins were identified more than twice from parallel 2DE gels with an average sequence coverage of >50% and rather restrictive score thresholds (Mascot >or=95, ProFound >or=2.2, MetaScore >or=97). The MS technologies and bioinformatics tools that were implemented and integrated to handle this complex proteomic project are presented. In addition, we describe the basic principles and current developments of the applied technologies and provide an overview over the current state of microbial proteome research. Copyright (c) 2006 S. Karger AG, Basel.

  13. Recent advances in mass spectrometry-based proteomics of gastric cancer.

    PubMed

    Kang, Changwon; Lee, Yejin; Lee, J Eugene

    2016-10-07

    The last decade has witnessed remarkable technological advances in mass spectrometry-based proteomics. The development of proteomics techniques has enabled the reliable analysis of complex proteomes, leading to the identification and quantification of thousands of proteins in gastric cancer cells, tissues, and sera. This quantitative information has been used to profile the anomalies in gastric cancer and provide insights into the pathogenic mechanism of the disease. In this review, we mainly focus on the advances in mass spectrometry and quantitative proteomics that were achieved in the last five years and how these up-and-coming technologies are employed to track biochemical changes in gastric cancer cells. We conclude by presenting a perspective on quantitative proteomics and its future applications in the clinic and translational gastric cancer research.

  14. Unlocking the proteomic information encoded in MALDI-TOF-MS data used for microbial identification and characterization

    USDA-ARS?s Scientific Manuscript database

    Introduction: Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOFMS)is increasingly utilized as a rapid technique to identify microorganisms including pathogenic bacteria. However, little attention has been paid to the significant proteomic information encoded in ...

  15. 8 Allergenic Composition of Polymerized Allergen Extracts of Betula verrucosa, Dermatophagoides Pteronyssinus and Phleum Pratense

    PubMed Central

    Fernandez-Caldas, Enrique; Cases, Barbara; Tudela, Jose Ignacio; Fernandez, Eva Abel; Casanovas, Miguel; Subiza, Jose Luis

    2012-01-01

    Background Allergoids have been successfully used in the treatment of respiratory allergic diseases. They are modified allergen extracts that allow the administration of high allergen doses, due to their reduced IgE binding capacity.They maintain allergen-specific T-cell recognition. Since they are native allergen extracts that have been polymerized with glutaraldehyde, identification of the allergenic molecules requires more complicated methods. The aim of the study was to determine the qualitative composition of different polymerized extracts and investigate the presence of defined allergenic molecules using Mass spectrometry. Methods Proteomic analysis was carried out at the Proteomics Facility of the Hospital Nacional de Parapléjicos (Toledo, Spain). After reduction and alkylation, proteins were digested with trypsin and the resulting peptides were cleaned using C18 SpinTips Sample Prep Kit; peptides were separated on an Ultimate nano-LC system using a Monolithic C18 column in combination with a precolumn for salt removal. Fractionation of the peptides was performed with a Probot microfraction collector and MS and MS/MS analysis of offline spotted peptide samples were performed using the Applied Biosystems 4800 plus MALDI TOF/TOF Analyzer mass spectrometer. ProteinPilot Software V 2.0.1 and the Paragon algorithm were used for the identification of the proteins. Each MS/MS spectrum was searched against the SwissProt 2010_10 database, Uniprot-Viridiplantae database and Uniprot_Betula database. Results Analysis of the peptides revealed the presence of native allergens in the polymerized extracts: Der p 1, Der p 2, Der p 3, Der p 8 and Der p 11 in D. pteronyssinus; Bet v 2, Bet v 6, Bet v 7 and several Bet v 1 isoforms in B. verrucosa and Phl p 1, Phl p 3, Phl p 5, Phl p 11 and Phl p 12 in P. pratense allergoids. In all cases, potential allergenic proteins were also identified, including ubiquitin, actin, Eenolase, fructose-bisphosphate aldolase, luminal-binding protein (Heat shock protein 70), calmodulin, among others. Conclusions The characterization of the allergenic composition of allergoids is possible using MS/MS analysis. The analysis confirms the presence of native allergens in the allergoids. Mayor allergens are preserved during polymerization.

  16. A Review: Proteomics in Retinal Artery Occlusion, Retinal Vein Occlusion, Diabetic Retinopathy and Acquired Macular Disorders.

    PubMed

    Cehofski, Lasse Jørgensen; Honoré, Bent; Vorum, Henrik

    2017-04-28

    Retinal artery occlusion (RAO), retinal vein occlusion (RVO), diabetic retinopathy (DR) and age-related macular degeneration (AMD) are frequent ocular diseases with potentially sight-threatening outcomes. In the present review we discuss major findings of proteomic studies of RAO, RVO, DR and AMD, including an overview of ocular proteome changes associated with anti-vascular endothelial growth factor (VEGF) treatments. Despite the severe outcomes of RAO, the proteome of the disease remains largely unstudied. There is also limited knowledge about the proteome of RVO, but proteomic studies suggest that RVO is associated with remodeling of the extracellular matrix and adhesion processes. Proteomic studies of DR have resulted in the identification of potential therapeutic targets such as carbonic anhydrase-I. Proliferative diabetic retinopathy is the most intensively studied stage of DR. Proteomic studies have established VEGF, pigment epithelium-derived factor (PEDF) and complement components as key factors associated with AMD. The aim of this review is to highlight the major milestones in proteomics in RAO, RVO, DR and AMD. Through large-scale protein analyses, proteomics is bringing new important insights into these complex pathological conditions.

  17. Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution.

    PubMed

    Alves, Gelio; Yu, Yi-Kuo

    2016-09-01

    There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit yyu@ncbi.nlm.nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  18. Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

    PubMed

    Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu

    2013-01-04

    Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .

  19. The Pfam protein families database: towards a more sustainable future.

    PubMed

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-04

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. The Challenge of Human Spermatozoa Proteome: A Systematic Review.

    PubMed

    Gilany, Kambiz; Minai-Tehrani, Arash; Amini, Mehdi; Agharezaee, Niloofar; Arjmand, Babak

    2017-01-01

    Currently, there are 20,197 human protein-coding genes in the most expertly curated database (UniProtKB/Swiss-Pro). Big efforts have been made by the international consortium, the Chromosome-Centric Human Proteome Project (C-HPP) and independent researchers, to map human proteome. In brief, anno 2017 the human proteome was outlined. The male factor contributes to 50% of infertility in couples. However, there are limited human spermatozoa proteomic studies. Firstly, the development of the mapping of the human spermatozoa was analyzed. The human spermatozoa have been used as a model for missing proteins. It has been shown that human spermatozoa are excellent sources for finding missing proteins. Y chromosome proteome mapping is led by Iran. However, it seems that it is extremely challenging to map the human spermatozoa Y chromosome proteins based on current mass spectrometry-based proteomics technology. Post-translation modifications (PTMs) of human spermatozoa proteome are the most unexplored area and currently the exact role of PTMs in male infertility is unknown. Additionally, the clinical human spermatozoa proteomic analysis, anno 2017 was done in this study.

  1. Molecular Diagnosis and Biomarker Identification on SELDI proteomics data by ADTBoost method.

    PubMed

    Wang, Lu-Yong; Chakraborty, Amit; Comaniciu, Dorin

    2005-01-01

    Clinical proteomics is an emerging field that will have great impact on molecular diagnosis, identification of disease biomarkers, drug discovery and clinical trials in the post-genomic era. Protein profiling in tissues and fluids in disease and pathological control and other proteomics techniques will play an important role in molecular diagnosis with therapeutics and personalized healthcare. We introduced a new robust diagnostic method based on ADTboost algorithm, a novel algorithm in proteomics data analysis to improve classification accuracy. It generates classification rules, which are often smaller and easier to interpret. This method often gives most discriminative features, which can be utilized as biomarkers for diagnostic purpose. Also, it has a nice feature of providing a measure of prediction confidence. We carried out this method in amyotrophic lateral sclerosis (ALS) disease data acquired by surface enhanced laser-desorption/ionization-time-of-flight mass spectrometry (SELDI-TOF MS) experiments. Our method is shown to have outstanding prediction capacity through the cross-validation, ROC analysis results and comparative study. Our molecular diagnosis method provides an efficient way to distinguish ALS disease from neurological controls. The results are expressed in a simple and straightforward alternating decision tree format or conditional format. We identified most discriminative peaks in proteomic data, which can be utilized as biomarkers for diagnosis. It will have broad application in molecular diagnosis through proteomics data analysis and personalized medicine in this post-genomic era.

  2. From proteomics to systems biology: MAPA, MASS WESTERN, PROMEX, and COVAIN as a user-oriented platform.

    PubMed

    Weckwerth, Wolfram; Wienkoop, Stefanie; Hoehenwarter, Wolfgang; Egelhofer, Volker; Sun, Xiaoliang

    2014-01-01

    Genome sequencing and systems biology are revolutionizing life sciences. Proteomics emerged as a fundamental technique of this novel research area as it is the basis for gene function analysis and modeling of dynamic protein networks. Here a complete proteomics platform suited for functional genomics and systems biology is presented. The strategy includes MAPA (mass accuracy precursor alignment; http://www.univie.ac.at/mosys/software.html ) as a rapid exploratory analysis step; MASS WESTERN for targeted proteomics; COVAIN ( http://www.univie.ac.at/mosys/software.html ) for multivariate statistical analysis, data integration, and data mining; and PROMEX ( http://www.univie.ac.at/mosys/databases.html ) as a database module for proteogenomics and proteotypic peptides for targeted analysis. Moreover, the presented platform can also be utilized to integrate metabolomics and transcriptomics data for the analysis of metabolite-protein-transcript correlations and time course analysis using COVAIN. Examples for the integration of MAPA and MASS WESTERN data, proteogenomic and metabolic modeling approaches for functional genomics, phosphoproteomics by integration of MOAC (metal-oxide affinity chromatography) with MAPA, and the integration of metabolomics, transcriptomics, proteomics, and physiological data using this platform are presented. All software and step-by-step tutorials for data processing and data mining can be downloaded from http://www.univie.ac.at/mosys/software.html.

  3. Biomarker Candidates of Chlamydophila pneumoniae Proteins and Protein Fragments Identified by Affinity-Proteomics Using FTICR-MS and LC-MS/MS

    NASA Astrophysics Data System (ADS)

    Susnea, Iuliana; Bunk, Sebastian; Wendel, Albrecht; Hermann, Corinna; Przybylski, Michael

    2011-04-01

    We report here an affinity-proteomics approach that combines 2D-gel electrophoresis and immunoblotting with high performance mass spectrometry to the identification of both full length protein antigens and antigenic fragments of Chlamydophila pneumoniae (C. pneumoniae). The present affinity-mass spectrometry approach effectively utilized high resolution FTICR mass spectrometry and LC-tandem-MS for protein identification, and enabled the identification of several new highly antigenic C. pneumoniae proteins that were not hitherto reported or previously detected only in other Chlamydia species, such as Chlamydia trachomatis. Moreover, high resolution affinity-MS provided the identification of several neo-antigenic protein fragments containing N- and C-terminal, and central domains such as fragments of the membrane protein Pmp21 and the secreted chlamydial proteasome-like factor (Cpaf), representing specific biomarker candidates.

  4. Proteome-based bacterial identification using matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS): A revolutionary shift in clinical diagnostic microbiology.

    PubMed

    Nomura, Fumio

    2015-06-01

    Rapid and accurate identification of microorganisms, a prerequisite for appropriate patient care and infection control, is a critical function of any clinical microbiology laboratory. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) is a quick and reliable method for identification of microorganisms, including bacteria, yeast, molds, and mycobacteria. Indeed, there has been a revolutionary shift in clinical diagnostic microbiology. In the present review, the state of the art and advantages of MALDI-TOF MS-based bacterial identification are described. The potential of this innovative technology for use in strain typing and detection of antibiotic resistance is also discussed. This article is part of a Special Issue entitled: Medical Proteomics. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Human body fluid proteome analysis

    PubMed Central

    Hu, Shen; Loo, Joseph A.; Wong, David T.

    2010-01-01

    The focus of this article is to review the recent advances in proteome analysis of human body fluids, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, and amniotic fluid, as well as its applications to human disease biomarker discovery. We aim to summarize the proteomics technologies currently used for global identification and quantification of body fluid proteins, and elaborate the putative biomarkers discovered for a variety of human diseases through human body fluid proteome (HBFP) analysis. Some critical concerns and perspectives in this emerging field are also discussed. With the advances made in proteomics technologies, the impact of HBFP analysis in the search for clinically relevant disease biomarkers would be realized in the future. PMID:17083142

  6. Human body fluid proteome analysis.

    PubMed

    Hu, Shen; Loo, Joseph A; Wong, David T

    2006-12-01

    The focus of this article is to review the recent advances in proteome analysis of human body fluids, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, and amniotic fluid, as well as its applications to human disease biomarker discovery. We aim to summarize the proteomics technologies currently used for global identification and quantification of body fluid proteins, and elaborate the putative biomarkers discovered for a variety of human diseases through human body fluid proteome (HBFP) analysis. Some critical concerns and perspectives in this emerging field are also discussed. With the advances made in proteomics technologies, the impact of HBFP analysis in the search for clinically relevant disease biomarkers would be realized in the future.

  7. Understanding Cullin-RING E3 Biology through Proteomics-based Substrate Identification*

    PubMed Central

    Harper, J. Wade; Tan, Meng-Kwang Marcus

    2012-01-01

    Protein turnover through the ubiquitin-proteasome pathway controls numerous developmental decisions and biochemical processes in eukaryotes. Central to protein ubiquitylation are ubiquitin ligases, which provide specificity in targeted ubiquitylation. With more than 600 ubiquitin ligases encoded by the human genome, many of which remain to be studied, considerable effort is being placed on the development of methods for identifying substrates of specific ubiquitin ligases. In this review, we describe proteomic technologies for the identification of ubiquitin ligase targets, with a particular focus on members of the cullin-RING E3 class of ubiquitin ligases, which use F-box proteins as substrate specific adaptor proteins. Various proteomic methods are described and are compared with genetic approaches that are available. The continued development of such methods is likely to have a substantial impact on the ubiquitin-proteasome field. PMID:22962057

  8. Understanding cullin-RING E3 biology through proteomics-based substrate identification.

    PubMed

    Harper, J Wade; Tan, Meng-Kwang Marcus

    2012-12-01

    Protein turnover through the ubiquitin-proteasome pathway controls numerous developmental decisions and biochemical processes in eukaryotes. Central to protein ubiquitylation are ubiquitin ligases, which provide specificity in targeted ubiquitylation. With more than 600 ubiquitin ligases encoded by the human genome, many of which remain to be studied, considerable effort is being placed on the development of methods for identifying substrates of specific ubiquitin ligases. In this review, we describe proteomic technologies for the identification of ubiquitin ligase targets, with a particular focus on members of the cullin-RING E3 class of ubiquitin ligases, which use F-box proteins as substrate specific adaptor proteins. Various proteomic methods are described and are compared with genetic approaches that are available. The continued development of such methods is likely to have a substantial impact on the ubiquitin-proteasome field.

  9. Steps for successful implementation of proteomic research in the OR.

    PubMed

    Martin, Chidima Tsion; Henry, Linda; Martin, Lisa; Ad, Niv

    2010-02-01

    Proteomic studies (ie, the investigation and identification of proteins found in biological samples such as blood and tissue) are at the forefront of the identification of disease biomarkers and the understanding of proteins. These studies promise to enhance diagnostic and prognostic analysis across all disciplines of clinical practice. As the practice of nursing and medicine becomes more preventative in nature and predictive in terms of patient care, successfully integrating and implementing proteomic research will become increasingly important, especially in the OR. It is imperative that perioperative nurses and researchers establish a collaborative process for specimen collection. Steps in establishing and maintaining a successful specimen collection program include implementing and evaluating a protocol, developing good communication, and keeping all participants up to date on the progress of the study. Copyright 2010 AORN, Inc. Published by Elsevier Inc. All rights reserved.

  10. Genome-Wide Identification of Molecular Mimicry Candidates in Parasites

    PubMed Central

    Ludin, Philipp; Nilsson, Daniel; Mäser, Pascal

    2011-01-01

    Among the many strategies employed by parasites for immune evasion and host manipulation, one of the most fascinating is molecular mimicry. With genome sequences available for host and parasite, mimicry of linear amino acid epitopes can be investigated by comparative genomics. Here we developed an in silico pipeline for genome-wide identification of molecular mimicry candidate proteins or epitopes. The predicted proteome of a given parasite was broken down into overlapping fragments, each of which was screened for close hits in the human proteome. Control searches were carried out against unrelated, free-living eukaryotes to eliminate the generally conserved proteins, and with randomized versions of the parasite proteins to get an estimate of statistical significance. This simple but computation-intensive approach yielded interesting candidates from human-pathogenic parasites. From Plasmodium falciparum, it returned a 14 amino acid motif in several of the PfEMP1 variants identical to part of the heparin-binding domain in the immunosuppressive serum protein vitronectin. And in Brugia malayi, fragments were detected that matched to periphilin-1, a protein of cell-cell junctions involved in barrier formation. All the results are publicly available by means of mimicDB, a searchable online database for molecular mimicry candidates from pathogens. To our knowledge, this is the first genome-wide survey for molecular mimicry proteins in parasites. The strategy can be adopted to any pair of host and pathogen, once appropriate negative control organisms are chosen. MimicDB provides a host of new starting points to gain insights into the molecular nature of host-pathogen interactions. PMID:21408160

  11. Automatic and rapid identification of glycopeptides by nano-UPLC-LTQ-FT-MS and proteomic search engine.

    PubMed

    Giménez, Estela; Gay, Marina; Vilaseca, Marta

    2017-01-30

    Here we demonstrate the potential of nano-UPLC-LTQ-FT-MS and the Byonic™ proteomic search engine for the separation, detection, and identification of N- and O-glycopeptide glycoforms in standard glycoproteins. The use of a BEH C18 nanoACQUITY column allowed the separation of the glycopeptides present in the glycoprotein digest and a baseline-resolution of the glycoforms of the same glycopeptide on the basis of the number of sialic acids. Moreover, we evaluated several acquisition strategies in order to improve the detection and characterization of glycopeptide glycoforms with the maximum number of identification percentages. The proposed strategy is simple to set up with the technology platforms commonly used in proteomic labs. The method allows the straightforward and rapid obtention of a general glycosylated map of a given protein, including glycosites and their corresponding glycosylated structures. The MS strategy selected in this work, based on a gas phase fractionation approach, led to 136 unique peptides from four standard proteins, which represented 78% of the total number of peptides identified. Moreover, the method does not require an extra glycopeptide enrichment step, thus preventing the bias that this step could cause towards certain glycopeptide species. Data are available via ProteomeXchange with identifier PXD003578. We propose a simple and high-throughput glycoproteomics-based methodology that allows the separation of glycopeptide glycoforms on the basis of the number of sialic acids, and their automatic and rapid identification without prior knowledge of protein glycosites or type and structure of the glycans. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Generic comparison of protein inference engines.

    PubMed

    Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi

    2012-04-01

    Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.

  13. Proteome Studies of Filamentous Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Scott E.; Panisko, Ellen A.

    2011-04-20

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less

  14. Proteome studies of filamentous fungi.

    PubMed

    Baker, Scott E; Panisko, Ellen A

    2011-01-01

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.

  15. M2Lite: An Open-source, Light-weight, Pluggable and Fast Proteome Discoverer MSF to mzIdentML Tool.

    PubMed

    Aiyetan, Paul; Zhang, Bai; Chen, Lily; Zhang, Zhen; Zhang, Hui

    2014-04-28

    Proteome Discoverer is one of many tools used for protein database search and peptide to spectrum assignment in mass spectrometry-based proteomics. However, the inadequacy of conversion tools makes it challenging to compare and integrate its results to those of other analytical tools. Here we present M2Lite, an open-source, light-weight, easily pluggable and fast conversion tool. M2Lite converts proteome discoverer derived MSF files to the proteomics community defined standard - the mzIdentML file format. M2Lite's source code is available as open-source at https://bitbucket.org/paiyetan/m2lite/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/m2lite/downloads.

  16. Large-Scale Interaction Profiling of Protein Domains Through Proteomic Peptide-Phage Display Using Custom Peptidomes.

    PubMed

    Seo, Moon-Hyeong; Nim, Satra; Jeon, Jouhyun; Kim, Philip M

    2017-01-01

    Protein-protein interactions are essential to cellular functions and signaling pathways. We recently combined bioinformatics and custom oligonucleotide arrays to construct custom-made peptide-phage libraries for screening peptide-protein interactions, an approach we call proteomic peptide-phage display (ProP-PD). In this chapter, we describe protocols for phage display for the identification of natural peptide binders for a given protein. We finally describe deep sequencing for the analysis of the proteomic peptide-phage display.

  17. Proteomic Analysis to Identify Functional Molecules in Drug Resistance Caused by E-Cadherin Knockdown in 3D-Cultured Colorectal Cancer Models

    DTIC Science & Technology

    2014-09-01

    total number of 538 phosphopeptides were identified, among which 350 phosphopeptides had been identified with the first round of TiO2 enrichment and 430...year research and the collection of proteomic and phosphoproteomic data is still in process. PRODUCTS Manuscripts: Yue XS , Hummon AB. Combining...of IMAC and TiO2 enrichment methods to increase phosphoproteomic identifications, manuscript in preparation. Yue XS , Hummon AB. Proteomic and

  18. Scientific Workflow Management in Proteomics

    PubMed Central

    de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus

    2012-01-01

    Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703

  19. Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

    PubMed Central

    Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

    2009-01-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578

  20. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

    PubMed

    Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

    2009-06-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).

  1. Identification of Carboxypeptidase Substrates by C-Terminal COFRADIC.

    PubMed

    Tanco, Sebastian; Aviles, Francesc Xavier; Gevaert, Kris; Lorenzo, Julia; Van Damme, Petra

    2017-01-01

    We here present a detailed procedure for studying protein C-termini and their posttranslational modifications by C-terminal COFRADIC. In fact, this procedure can enrich for both C-terminal and N-terminal peptides through a combination of a strong cation exchange fractionation step at low pH, which removes the majority of nonterminal peptides in whole-proteome digests, while the actual COFRADIC step segregates C-terminal peptides from N-terminal peptides. When used in a differential mode, C-terminal COFRADIC allows for the identification of neo-C-termini generated by the action of proteases, which in turn leads to the identification of protease substrates. More specifically, this technology can be applied to determine the natural substrate repertoire of carboxypeptidases on a proteome-wide scale.

  2. Trends in mass spectrometry instrumentation for proteomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Richard D.

    2002-12-01

    Mass spectrometry has become a primary tool for proteomics due to its capabilities for rapid and sensitive protein identification and quantitation. It is now possible to identify thousands of proteins from microgram sample quantities in a single day and to quantify relative protein abundances. However, the needs for increased capabilities for proteome measurements are immense and are now driving both new strategies and instrument advances. These developments include those based on integration with multi-dimensional liquid separations and high accuracy mass measurements, and promise more than order of magnitude improvements in sensitivity, dynamic range, and throughput for proteomic analyses in themore » near future.« less

  3. Mass spectrometry-based proteomics for translational research: a technical overview.

    PubMed

    Paulo, Joao A; Kadiyala, Vivek; Banks, Peter A; Steen, Hanno; Conwell, Darwin L

    2012-03-01

    Mass spectrometry-based investigation of clinical samples enables the high-throughput identification of protein biomarkers. We provide an overview of mass spectrometry-based proteomic techniques that are applicable to the investigation of clinical samples. We address sample collection, protein extraction and fractionation, mass spectrometry modalities, and quantitative proteomics. Finally, we examine the limitations and further potential of such technologies. Liquid chromatography fractionation coupled with tandem mass spectrometry is well suited to handle mixtures of hundreds or thousands of proteins. Mass spectrometry-based proteome elucidation can reveal potential biomarkers and aid in the development of hypotheses for downstream investigation of the molecular mechanisms of disease.

  4. Label-free proteome of water buffalo (Bubalus bubalis) seminal plasma.

    PubMed

    Brito, Mayara F; Auler, Patrícia A; Tavares, Guilherme C; Rezende, Cristiana P; Almeida, Gabriel M F; Pereira, Felipe L; Leal, Carlos A G; Moura, Arlindo de Alencar; Figueiredo, Henrique C P; Henry, Marc

    2018-06-11

    The study aimed to describe the Bubalus bubalis seminal plasma proteome using a label-free shotgun UDMS E approach. A total of 859 nonredundant proteins were identified across five biological replicates with stringent identification. Proteins specifically related to sperm maturation and protection, capacitation, fertilization and metabolic activity were detected in the buffalo seminal fluid. In conclusion, we provide a comprehensive proteomic profile of buffalo seminal plasma, which establishes a foundation for further studies designed to understand regulation of sperm function and discovery of novel biomarkers for fertility. MS data are available in the ProteomeXchange with identifier PXD003728. © 2018 Blackwell Verlag GmbH.

  5. Mass Spectrometry-Based Proteomics for Translational Research: A Technical Overview

    PubMed Central

    Paulo, Joao A.; Kadiyala, Vivek; Banks, Peter A.; Steen, Hanno; Conwell, Darwin L.

    2012-01-01

    Mass spectrometry-based investigation of clinical samples enables the high-throughput identification of protein biomarkers. We provide an overview of mass spectrometry-based proteomic techniques that are applicable to the investigation of clinical samples. We address sample collection, protein extraction and fractionation, mass spectrometry modalities, and quantitative proteomics. Finally, we examine the limitations and further potential of such technologies. Liquid chromatography fractionation coupled with tandem mass spectrometry is well suited to handle mixtures of hundreds or thousands of proteins. Mass spectrometry-based proteome elucidation can reveal potential biomarkers and aid in the development of hypotheses for downstream investigation of the molecular mechanisms of disease. PMID:22461744

  6. Using HPLC-Mass Spectrometry to Teach Proteomics Concepts with Problem-Based Techniques

    ERIC Educational Resources Information Center

    Short, Michael; Short, Anne; Vankempen, Rachel; Seymour, Michael; Burnatowska-Hledin, Maria

    2010-01-01

    Practical instruction of proteomics concepts was provided using high-performance liquid chromatography coupled with a mass selective detection system (HPLC-MS) for the analysis of simulated protein digests. The samples were prepared from selected dipeptides in order to facilitate the mass spectral identification. As part of the prelaboratory…

  7. MALDI-TOF MS of Trichoderma: A model system for the identification of microfungi

    USDA-ARS?s Scientific Manuscript database

    This investigation aimed to assess whether MALDI-TOF MS analysis of proteomics could be applied to the study of Trichoderma, a fungal genus selected because it includes many species and is phylogenetically well defined. We also investigated whether MALDI-TOF MS analysis of proteomics would reveal ap...

  8. Identification of methyllysine peptides binding to chromobox protein homolog 6 chromodomain in the human proteome.

    PubMed

    Li, Nan; Stein, Richard S L; He, Wei; Komives, Elizabeth; Wang, Wei

    2013-10-01

    Methylation is one of the important post-translational modifications that play critical roles in regulating protein functions. Proteomic identification of this post-translational modification and understanding how it affects protein activity remain great challenges. We tackled this problem from the aspect of methylation mediating protein-protein interaction. Using the chromodomain of human chromobox protein homolog 6 as a model system, we developed a systematic approach that integrates structure modeling, bioinformatics analysis, and peptide microarray experiments to identify lysine residues that are methylated and recognized by the chromodomain in the human proteome. Given the important role of chromobox protein homolog 6 as a reader of histone modifications, it was interesting to find that the majority of its interacting partners identified via this approach function in chromatin remodeling and transcriptional regulation. Our study not only illustrates a novel angle for identifying methyllysines on a proteome-wide scale and elucidating their potential roles in regulating protein function, but also suggests possible strategies for engineering the chromodomain-peptide interface to enhance the recognition of and manipulate the signal transduction mediated by such interactions.

  9. Implementation of statistical process control for proteomic experiments via LC MS/MS.

    PubMed

    Bereman, Michael S; Johnson, Richard; Bollinger, James; Boss, Yuval; Shulman, Nick; MacLean, Brendan; Hoofnagle, Andrew N; MacCoss, Michael J

    2014-04-01

    Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface, which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution), and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.

  10. A Review: Proteomics in Retinal Artery Occlusion, Retinal Vein Occlusion, Diabetic Retinopathy and Acquired Macular Disorders

    PubMed Central

    Cehofski, Lasse Jørgensen; Honoré, Bent; Vorum, Henrik

    2017-01-01

    Retinal artery occlusion (RAO), retinal vein occlusion (RVO), diabetic retinopathy (DR) and age-related macular degeneration (AMD) are frequent ocular diseases with potentially sight-threatening outcomes. In the present review we discuss major findings of proteomic studies of RAO, RVO, DR and AMD, including an overview of ocular proteome changes associated with anti-vascular endothelial growth factor (VEGF) treatments. Despite the severe outcomes of RAO, the proteome of the disease remains largely unstudied. There is also limited knowledge about the proteome of RVO, but proteomic studies suggest that RVO is associated with remodeling of the extracellular matrix and adhesion processes. Proteomic studies of DR have resulted in the identification of potential therapeutic targets such as carbonic anhydrase-I. Proliferative diabetic retinopathy is the most intensively studied stage of DR. Proteomic studies have established VEGF, pigment epithelium-derived factor (PEDF) and complement components as key factors associated with AMD. The aim of this review is to highlight the major milestones in proteomics in RAO, RVO, DR and AMD. Through large-scale protein analyses, proteomics is bringing new important insights into these complex pathological conditions. PMID:28452939

  11. An accurate proteomic quantification method: fluorescence labeling absolute quantification (FLAQ) using multidimensional liquid chromatography and tandem mass spectrometry.

    PubMed

    Liu, Junyan; Liu, Yang; Gao, Mingxia; Zhang, Xiangmin

    2012-08-01

    A facile proteomic quantification method, fluorescent labeling absolute quantification (FLAQ), was developed. Instead of using MS for quantification, the FLAQ method is a chromatography-based quantification in combination with MS for identification. Multidimensional liquid chromatography (MDLC) with laser-induced fluorescence (LIF) detection with high accuracy and tandem MS system were employed for FLAQ. Several requirements should be met for fluorescent labeling in MS identification: Labeling completeness, minimum side-reactions, simple MS spectra, and no extra tandem MS fragmentations for structure elucidations. A fluorescence dye, 5-iodoacetamidofluorescein, was finally chosen to label proteins on all cysteine residues. The fluorescent dye was compatible with the process of the trypsin digestion and MALDI MS identification. Quantitative labeling was achieved with optimization of reacting conditions. A synthesized peptide and model proteins, BSA (35 cysteines), OVA (five cysteines), were used for verifying the completeness of labeling. Proteins were separated through MDLC and quantified based on fluorescent intensities, followed by MS identification. High accuracy (RSD% < 1.58) and wide linearity of quantification (1-10(5) ) were achieved by LIF detection. The limit of quantitation for the model protein was as low as 0.34 amol. Parts of proteins in human liver proteome were quantified and demonstrated using FLAQ. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Histoplasma capsulatum proteome response to decreased iron availability

    PubMed Central

    Winters, Michael S; Spellman, Daniel S; Chan, Qilin; Gomez, Francisco J; Hernandez, Margarita; Catron, Brittany; Smulian, Alan G; Neubert, Thomas A; Deepe, George S

    2008-01-01

    Background A fundamental pathogenic feature of the fungus Histoplasma capsulatum is its ability to evade innate and adaptive immune defenses. Once ingested by macrophages the organism is faced with several hostile environmental conditions including iron limitation. H. capsulatum can establish a persistent state within the macrophage. A gap in knowledge exists because the identities and number of proteins regulated by the organism under host conditions has yet to be defined. Lack of such knowledge is an important problem because until these proteins are identified it is unlikely that they can be targeted as new and innovative treatment for histoplasmosis. Results To investigate the proteomic response by H. capsulatum to decreasing iron availability we have created H. capsulatum protein/genomic databases compatible with current mass spectrometric (MS) search engines. Databases were assembled from the H. capsulatum G217B strain genome using gene prediction programs and expressed sequence tag (EST) libraries. Searching these databases with MS data generated from two dimensional (2D) in-gel digestions of proteins resulted in over 50% more proteins identified compared to searching the publicly available fungal databases alone. Using 2D gel electrophoresis combined with statistical analysis we discovered 42 H. capsulatum proteins whose abundance was significantly modulated when iron concentrations were lowered. Altered proteins were identified by mass spectrometry and database searching to be involved in glycolysis, the tricarboxylic acid cycle, lysine metabolism, protein synthesis, and one protein sequence whose function was unknown. Conclusion We have created a bioinformatics platform for H. capsulatum and demonstrated the utility of a proteomic approach by identifying a shift in metabolism the organism utilizes to cope with the hostile conditions provided by the host. We have shown that enzyme transcripts regulated by other fungal pathogens in response to lowering iron availability are also regulated in H. capsulatum at the protein level. We also identified H. capsulatum proteins sensitive to iron level reductions which have yet to be connected to iron availability in other pathogens. These data also indicate the complexity of the response by H. capsulatum to nutritional deprivation. Finally, we demonstrate the importance of a strain specific gene/protein database for H. capsulatum proteomic analysis. PMID:19108728

  13. Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer

    PubMed Central

    Kim, Yunee; Jeon, Jouhyun; Mejia, Salvador; Yao, Cindy Q; Ignatchenko, Vladimir; Nyalwidhe, Julius O; Gramolini, Anthony O; Lance, Raymond S; Troyer, Dean A; Drake, Richard R; Boutros, Paul C; Semmes, O. John; Kislinger, Thomas

    2016-01-01

    Biomarkers are rapidly gaining importance in personalized medicine. Although numerous molecular signatures have been developed over the past decade, there is a lack of overlap and many biomarkers fail to validate in independent patient cohorts and hence are not useful for clinical application. For these reasons, identification of novel and robust biomarkers remains a formidable challenge. We combine targeted proteomics with computational biology to discover robust proteomic signatures for prostate cancer. Quantitative proteomics conducted in expressed prostatic secretions from men with extraprostatic and organ-confined prostate cancers identified 133 differentially expressed proteins. Using synthetic peptides, we evaluate them by targeted proteomics in a 74-patient cohort of expressed prostatic secretions in urine. We quantify a panel of 34 candidates in an independent 207-patient cohort. We apply machine-learning approaches to develop clinical predictive models for prostate cancer diagnosis and prognosis. Our results demonstrate that computationally guided proteomics can discover highly accurate non-invasive biomarkers. PMID:27350604

  14. Mass spectrometry based proteomics: existing capabilities and future directions

    PubMed Central

    Angel, Thomas E.; Aryal, Uma K.; Hengel, Shawna M.; Baker, Erin S.; Kelly, Ryan T.; Robinson, Errol W.; Smith, Richard D.

    2012-01-01

    Mass spectrometry (MS)-based proteomics is emerging as a broadly effective means for identification, characterization, and quantification of proteins that are integral components of the processes essential for life. Characterization of proteins at the proteome and sub-proteome (e.g., the phosphoproteome, proteoglycome, or degradome/peptidome) levels provides a foundation for understanding fundamental aspects of biology. Emerging technologies such as ion mobility separations coupled with MS and microchip-based-proteome measurements combined with MS instrumentation and chromatographic separation techniques, such as nanoscale reversed phase liquid chromatography and capillary electrophoresis, show great promise for both broad undirected and targeted highly sensitive measurements. MS-based proteomics is increasingly contribute to our understanding of the dynamics, interactions, and roles that proteins and peptides play, advancing our understanding of biology on a systems wide level for a wide range of applications including investigations of microbial communities, bioremediation, and human health. PMID:22498958

  15. Clinical proteomics-driven precision medicine for targeted cancer therapy: current overview and future perspectives.

    PubMed

    Zhou, Li; Wang, Kui; Li, Qifu; Nice, Edouard C; Zhang, Haiyuan; Huang, Canhua

    2016-01-01

    Cancer is a common disease that is a leading cause of death worldwide. Currently, early detection and novel therapeutic strategies are urgently needed for more effective management of cancer. Importantly, protein profiling using clinical proteomic strategies, with spectacular sensitivity and precision, offer excellent promise for the identification of potential biomarkers that would direct the development of targeted therapeutic anticancer drugs for precision medicine. In particular, clinical sample sources, including tumor tissues and body fluids (blood, feces, urine and saliva), have been widely investigated using modern high-throughput mass spectrometry-based proteomic approaches combined with bioinformatic analysis, to pursue the possibilities of precision medicine for targeted cancer therapy. Discussed in this review are the current advantages and limitations of clinical proteomics, the available strategies of clinical proteomics for the management of precision medicine, as well as the challenges and future perspectives of clinical proteomics-driven precision medicine for targeted cancer therapy.

  16. Subnanogram proteomics: Impact of LC column selection, MS instrumentation and data analysis strategy on proteome coverage for trace samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Ying; Zhao, Rui; Piehowski, Paul D.

    One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-µm-i.d. columns increase signal intensity by >3-fold relative to those using 75-µm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos mass spectrometer significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap), leading tomore » a ~3× increase in peptide identifications and 1.7× increase in identified protein groups for 2 ng tryptic digests of bacterial lysate. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~ 95% for 0.5 ng samples and by ~42% for 2 ng samples. The present platform is capable of identifying >3000 protein groups from tryptic digestion of cell lysates equivalent to 50 HeLa cells and 100 THP-1 cells (~10 ng total proteins), respectively, and >950 proteins from subnanogram bacterial and archaeal cell lysates. The present ultrasensitive LC-MS platform is expected to enable deep proteome coverage for subnanogram samples, including single mammalian cells.« less

  17. Identification of new intrinsic proteins in Arabidopsis plasma membrane proteome.

    PubMed

    Marmagne, Anne; Rouet, Marie-Aude; Ferro, Myriam; Rolland, Norbert; Alcon, Carine; Joyard, Jacques; Garin, Jérome; Barbier-Brygoo, Hélène; Ephritikhine, Geneviève

    2004-07-01

    Identification and characterization of anion channel genes in plants represent a goal for a better understanding of their central role in cell signaling, osmoregulation, nutrition, and metabolism. Though channel activities have been well characterized in plasma membrane by electrophysiology, the corresponding molecular entities are little documented. Indeed, the hydrophobic protein equipment of plant plasma membrane still remains largely unknown, though several proteomic approaches have been reported. To identify new putative transport systems, we developed a new proteomic strategy based on mass spectrometry analyses of a plasma membrane fraction enriched in hydrophobic proteins. We produced from Arabidopsis cell suspensions a highly purified plasma membrane fraction and characterized it in detail by immunological and enzymatic tests. Using complementary methods for the extraction of hydrophobic proteins and mass spectrometry analyses on mono-dimensional gels, about 100 proteins have been identified, 95% of which had never been found in previous proteomic studies. The inventory of the plasma membrane proteome generated by this approach contains numerous plasma membrane integral proteins, one-third displaying at least four transmembrane segments. The plasma membrane localization was confirmed for several proteins, therefore validating such proteomic strategy. An in silico analysis shows a correlation between the putative functions of the identified proteins and the expected roles for plasma membrane in transport, signaling, cellular traffic, and metabolism. This analysis also reveals 10 proteins that display structural properties compatible with transport functions and will constitute interesting targets for further functional studies.

  18. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández, Xosé M

    2018-01-04

    The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A proteomics study of barley powdery mildew haustoria.

    PubMed

    Godfrey, Dale; Zhang, Ziguo; Saalbach, Gerhard; Thordal-Christensen, Hans

    2009-06-01

    A number of fungal and oomycete plant pathogens of major economic importance feed on their hosts by means of haustoria, which they place inside living plant cells. The underlying mechanisms are poorly understood, partly due to difficulty in preparing haustoria. We have therefore developed a procedure for isolating haustoria from the barley powdery mildew fungus (Blumeria graminis f.sp. hordei, Bgh). We subsequently aimed to understand the molecular mechanisms of haustoria through a study of their proteome. Extracted proteins were digested using trypsin, separated by LC, and analysed by MS/MS. Searches of a custom Bgh EST sequence database and the NCBI-NR fungal protein database, using the MS/MS data, identified 204 haustoria proteins. The majority of the proteins appear to have roles in protein metabolic pathways and biological energy production. Surprisingly, pyruvate decarboxylase (PDC), involved in alcoholic fermentation and commonly abundant in fungi and plants, was absent in our Bgh proteome data set. A sequence encoding this enzyme was also absent in our EST sequence database. Significantly, BLAST searches of the recently available Bgh genome sequence data also failed to identify a sequence encoding this enzyme, strongly indicating that Bgh does not have a gene for PDC.

  20. Dentistry proteomics: from laboratory development to clinical practice.

    PubMed

    Rezende, Taia M B; Lima, Stella M F; Petriz, Bernardo A; Silva, Osmar N; Freire, Mirna S; Franco, Octávio L

    2013-12-01

    Despite all the dental information acquired over centuries and the importance of proteome research, the cross-link between these two areas only emerged around mid-nineties. Proteomic tools can help dentistry in the identification of risk factors, early diagnosis, prevention, and systematic control that will promote the evolution of treatment in all dentistry specialties. This review mainly focuses on the evolution of dentistry in different specialties based on proteomic research and how these tools can improve knowledge in dentistry. The subjects covered are an overview of proteomics in dentistry, specific information on different fields in dentistry (dental structure, restorative dentistry, endodontics, periodontics, oral pathology, oral surgery, and orthodontics) and future directions. There are many new proteomic technologies that have never been used in dentistry studies and some dentistry areas that have never been explored by proteomic tools. It is expected that a greater integration of these areas will help to understand what is still unknown in oral health and disease. Copyright © 2013 Wiley Periodicals, Inc.

  1. Affordable proteomics: the two-hybrid systems.

    PubMed

    Gillespie, Marc

    2003-06-01

    Numerous proteomic methodologies exist, but most require a heavy investment in expertise and technology. This puts these approaches out of reach for many laboratories and small companies, rarely allowing proteomics to be used as a pilot approach for biomarker or target identification. Two proteomic approaches, 2D gel electrophoresis and the two-hybrid systems, are currently available to most researchers. The two-hybrid systems, though accommodating to large-scale experiments, were originally designed as practical screens, that by comparison to current proteomics tools were small-scale, affordable and technically feasible. The screens rapidly generated data, identifying protein interactions that were previously uncharacterized. The foundation for a two-hybrid proteomic investigation can be purchased as separate kits from a number of companies. The true power of the technique lies not in its affordability, but rather in its portability. The two-hybrid system puts proteomics back into laboratories where the output of the screens can be evaluated by researchers with experience in the particular fields of basic research, cancer biology, toxicology or drug development.

  2. Nano-LC FTICR tandem mass spectrometry for top-down proteomics: routine baseline unit mass resolution of whole cell lysate proteins up to 72 kDa.

    PubMed

    Tipton, Jeremiah D; Tran, John C; Catherman, Adam D; Ahlf, Dorothy R; Durbin, Kenneth R; Lee, Ji Eun; Kellie, John F; Kelleher, Neil L; Hendrickson, Christopher L; Marshall, Alan G

    2012-03-06

    Current high-throughput top-down proteomic platforms provide routine identification of proteins less than 25 kDa with 4-D separations. This short communication reports the application of technological developments over the past few years that improve protein identification and characterization for masses greater than 25 kDa. Advances in separation science have allowed increased numbers of proteins to be identified, especially by nanoliquid chromatography (nLC) prior to mass spectrometry (MS) analysis. Further, a goal of high-throughput top-down proteomics is to extend the mass range for routine nLC MS analysis up to 80 kDa because gene sequence analysis predicts that ~70% of the human proteome is transcribed to be less than 80 kDa. Normally, large proteins greater than 50 kDa are identified and characterized by top-down proteomics through fraction collection and direct infusion at relatively low throughput. Further, other MS-based techniques provide top-down protein characterization, however at low resolution for intact mass measurement. Here, we present analysis of standard (up to 78 kDa) and whole cell lysate proteins by Fourier transform ion cyclotron resonance mass spectrometry (nLC electrospray ionization (ESI) FTICR MS). The separation platform reduced the complexity of the protein matrix so that, at 14.5 T, proteins from whole cell lysate up to 72 kDa are baseline mass resolved on a nano-LC chromatographic time scale. Further, the results document routine identification of proteins at improved throughput based on accurate mass measurement (less than 10 ppm mass error) of precursor and fragment ions for proteins up to 50 kDa.

  3. IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis

    PubMed Central

    Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A.

    2015-01-01

    The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. Contact: ppevzner@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072509

  4. Multi-mode acquisition (MMA): An MS/MS acquisition strategy for maximizing selectivity, specificity and sensitivity of DIA product ion spectra.

    PubMed

    Williams, Brad J; Ciavarini, Steve J; Devlin, Curt; Cohn, Steven M; Xie, Rong; Vissers, Johannes P C; Martin, LeRoy B; Caswell, Allen; Langridge, James I; Geromanos, Scott J

    2016-08-01

    In proteomics studies, it is generally accepted that depth of coverage and dynamic range is limited in data-directed acquisitions. The serial nature of the method limits both sensitivity and the number of precursor ions that can be sampled. To that end, a number of data-independent acquisition (DIA) strategies have been introduced with these methods, for the most part, immune to the sampling issue; nevertheless, some do have other limitations with respect to sensitivity. The major limitation with DIA approaches is interference, i.e., MS/MS spectra are highly chimeric and often incapable of being identified using conventional database search engines. Utilizing each available dimension of separation prior to ion detection, we present a new multi-mode acquisition (MMA) strategy multiplexing both narrowband and wideband DIA acquisitions in a single analytical workflow. The iterative nature of the MMA workflow limits the adverse effects of interference with minimal loss in sensitivity. Qualitative identification can be performed by selected ion chromatograms or conventional database search strategies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Identification and proteomic analysis of osteoblast-derived exosomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ge, Min; Ke, Ronghu; Cai, Tianyi

    Exosomes are nanometer-sized vesicles with the function of intercellular communication, and they are released by various cell types. To reveal the knowledge about the exosomes from osteoblast, and explore the potential functions of osteogenesis, we isolated microvesicles from supernatants of mouse Mc3t3 by ultracentrifugation, characterized exosomes by electron microscopy and immunoblotting and presented the protein profile by proteomic analysis. The result demonstrated that microvesicles were between 30 and 100 nm in diameter, round shape with cup-like concavity and expressed exosomal marker tumor susceptibility gene (TSG) 101 and flotillin (Flot) 1. We identified a total number of 1069 proteins among which 786more » proteins overlap with ExoCarta database. Gene Oncology analysis indicated that exosomes mostly derived from plasma membrane and mainly involved in protein localization and intracellular signaling. The Ingenuity Pathway Analysis showed pathways are mostly involved in exosome biogenesis, formation, uptake and osteogenesis. Among the pathways, eukaryotic initiation factor 2 pathways played an important role in osteogenesis. Our study identified osteoblast-derived exosomes, unveiled the content of them, presented potential osteogenesis-related proteins and pathways and provided a rich proteomics data resource that will be valuable for further studies of the functions of individual proteins in bone diseases. - Highlights: • We for the first time identified exosomes from mouse osteoblast. • Osteoblasts-derived exosomes contain osteoblast peculiar proteins. • Proteins from osteoblasts-derived exosomes are intently involved in EIF2 pathway. • EIF2α from the EIF2 pathway plays an important role in osteogenesis.« less

  6. Proteomic characterization of the hemolymph of Octopus vulgaris infected by the protozoan parasite Aggregata octopiana.

    PubMed

    Castellanos-Martínez, Sheila; Diz, Angel P; Álvarez-Chaver, Paula; Gestal, Camino

    2014-06-13

    The immune system of cephalopods is poorly known to date. The lack of genomic information makes difficult to understand vital processes like immune defense mechanisms and their interaction with pathogens at molecular level. The common octopus Octopus vulgaris has a high economic relevance and potential for aquaculture. However, disease outbreaks provoke serious reductions in production with potentially severe economic losses. In this study, a proteomic approach is used to analyze the immune response of O. vulgaris against the coccidia Aggregata octopiana, a gastrointestinal parasite which impairs the cephalopod nutritional status. The hemocytes and plasma proteomes were compared by 2-DE between sick and healthy octopus. The identities of 12 differentially expressed spots and other 27 spots without significant alteration from hemocytes, and 5 spots from plasma, were determined by mass spectrometry analysis aided by a six reading-frame translation of an octopus hemocyte RNA-seq database and also public databases. Principal component analysis pointed to 7 proteins from hemocytes as the major contributors to the overall difference between levels of infection and so could be considered as potential biomarkers. Particularly, filamin, fascin and peroxiredoxin are highlighted because of their implication in octopus immune defense activity. From the octopus plasma, hemocyanin was identified. This work represents a first step forward in order to characterize the protein profile of O. vulgaris hemolymph, providing important information for subsequent studies of the octopus immune system at molecular level and also to the understanding of the basis of octopus tolerance-resistance to A. octopiana. The immune system of cephalopods is poorly known to date. The lack of genomic information makes difficult to understand vital processes like immune defense mechanisms and their interaction with pathogens at molecular level. The study herein presented is focused to the comprehension of the octopus immune defense against a parasite infection. Particularly, it is centered in the host-parasite relationship developed between the octopus and the protozoan A. octopiana, which induces severe gastrointestinal injuries in octopus that produce a malabsorption syndrome. The common octopus is a commercially important species with a high potential for aquaculture in semi-open systems, and this pathology reduces the condition of the octopus populations on-growing in open-water systems resulting in important economical loses. This is the first proteomic approach developed on this host-parasite relationship, and therefore, the contribution of this work goes from i) ecological, since this particular relationship is tending to be established as a model of host-parasite interaction in natural populations; ii) evolutionary, due to the characterization of immune molecules that could contribute to understand the functioning of the immune defense in these highly evolved mollusks; and iii) to economical view. The results of this study provide an overview of the octopus hemolymph proteome. Furthermore, proteins influenced by the level of infection and implicated in the octopus cellular response are also showed. Consequently, a set of biomarkers for disease resistance is suggested for further research that could be valuable for the improvement of the octopus culture, taken into account their high economical value, the declining of landings and the need for the diversification of reared species in order to ensure the growth of the aquaculture activity. Although cephalopods are model species for biomedical studies and possess potential in aquaculture, their genomes have not been sequenced yet, which limits the application of genomic data to research important biological processes. Similarly, the octopus proteome, like other non-model organisms, is poorly represented in public databases. Most of the proteins were identified from an octopus' hemocyte RNA-seq database that we have performed, which will be the object of another manuscript in preparation. Therefore, the need to increase molecular data from non-model organisms is herein highlighted. Particularly, here is encouraged to expand the knowledge of the genomic of cephalopods in order to increase successful protein identifications. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Proteomics in medical microbiology.

    PubMed

    Cash, P

    2000-04-01

    The techniques of proteomics (high resolution two-dimensional electrophoresis and protein characterisation) are widely used for microbiological research to analyse global protein synthesis as an indicator of gene expression. The rapid progress in microbial proteomics has been achieved through the wide availability of whole genome sequences for a number of bacterial groups. Beyond providing a basic understanding of microbial gene expression, proteomics has also played a role in medical areas of microbiology. Progress has been made in the use of the techniques for investigating the epidemiology and taxonomy of human microbial pathogens, the identification of novel pathogenic mechanisms and the analysis of drug resistance. In each of these areas, proteomics has provided new insights that complement genomic-based investigations. This review describes the current progress in these research fields and highlights some of the technical challenges existing for the application of proteomics in medical microbiology. The latter concern the analysis of genetically heterogeneous bacterial populations and the integration of the proteomic and genomic data for these bacteria. The characterisation of the proteomes of bacterial pathogens growing in their natural hosts remains a future challenge.

  8. A new calibrant for MALDI-TOF-TOF-PSD-MS/MS of non-digested proteins for top-down proteomic analysis

    USDA-ARS?s Scientific Manuscript database

    RATIONALE: Matrix-assisted laser desorption/ionization (MALDI) time-of-flight-time-of-flight (TOF-TOF) tandem mass spectrometry (MS/MS) has seen increasing use for post-source decay (PSD)-MS/MS analysis of non-digested protein ions for top-down proteomic identification. However, there is no commonl...

  9. Membrane protease degradomics: proteomic identification and quantification of cell surface protease substrates.

    PubMed

    Butler, Georgina S; Dean, Richard A; Smith, Derek; Overall, Christopher M

    2009-01-01

    The modification of cell surface proteins by plasma membrane and soluble proteases is important for physiological and pathological processes. Methods to identify shed and soluble substrates are crucial to further define the substrate repertoire, termed the substrate degradome, of individual proteases. Identifying protease substrates is essential to elucidate protease function and involvement in different homeostatic and disease pathways. This characterisation is also crucial for drug target identification and validation, which would then allow the rational design of specific targeted inhibitors for therapeutic intervention. We describe two methods for identifying and quantifying shed cell surface protease targets in cultured cells utilising Isotope-Coded Affinity Tags (ICAT) and Isobaric Tags for Relative and Absolute Quantification (iTRAQ). As a model system to develop these techniques, we chose a cell-membrane expressed matrix metalloproteinase, MMP-14, but the concepts can be applied to proteases of other classes. By over-expression, or conversely inhibition, of a particular protease with careful selection of control conditions (e.g. vector or inactive protease) and differential labelling, shed proteins can be identified and quantified by mass spectrometry (MS), MS/MS fragmentation and database searching.

  10. Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Na, Seungjin; Payne, Samuel H.; Bandeira, Nuno

    The spectral networks approach enables the detection of pairs of spectra from related peptides and thus allows for the propagation of annotations from identified peptides to unidentified spectra. Beyond allowing for unbiased discovery of unexpected post-translational modifications, spectral networks are also applicable to multi-species comparative proteomics or metaproteomics to identify numerous orthologous versions of a protein. We present algorithmic and statistical advances in spectral networks that have made it possible to rigorously assess the statistical significance of spectral pairs and accurately estimate the error rate of identifications via propagation. In the analysis of three related Cyanothece species, a model organismmore » for biohydrogen production, spectral networks identified peptides with highly divergent sequences with up to dozens of variants per peptide, including many novel peptides in species that lack a sequenced genome. Furthermore, spectral networks strongly suggested the presence of novel peptides even in genomically characterized species (i.e. missing from databases) in that a significant portion of unidentified multi-species networks included at least two polymorphic peptide variants.« less

  11. Generation of comprehensive thoracic oncology database--tool for translational research.

    PubMed

    Surati, Mosmi; Robinson, Matthew; Nandi, Suvobroto; Faoro, Leonardo; Demchuk, Carley; Kanteti, Rajani; Ferguson, Benjamin; Gangadhar, Tara; Hensing, Thomas; Hasina, Rifat; Husain, Aliya; Ferguson, Mark; Karrison, Theodore; Salgia, Ravi

    2011-01-22

    The Thoracic Oncology Program Database Project was created to serve as a comprehensive, verified, and accessible repository for well-annotated cancer specimens and clinical data to be available to researchers within the Thoracic Oncology Research Program. This database also captures a large volume of genomic and proteomic data obtained from various tumor tissue studies. A team of clinical and basic science researchers, a biostatistician, and a bioinformatics expert was convened to design the database. Variables of interest were clearly defined and their descriptions were written within a standard operating manual to ensure consistency of data annotation. Using a protocol for prospective tissue banking and another protocol for retrospective banking, tumor and normal tissue samples from patients consented to these protocols were collected. Clinical information such as demographics, cancer characterization, and treatment plans for these patients were abstracted and entered into an Access database. Proteomic and genomic data have been included in the database and have been linked to clinical information for patients described within the database. The data from each table were linked using the relationships function in Microsoft Access to allow the database manager to connect clinical and laboratory information during a query. The queried data can then be exported for statistical analysis and hypothesis generation.

  12. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    PubMed

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-03-10

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.

  13. Use of Proteomic and Hematology Biomarkers for Prediction of Hematopoietic Acute Radiation Syndrome Severity in Baboon Radiation Models.

    PubMed

    Blakely, William F; Bolduc, David L; Debad, Jeff; Sigal, George; Port, Matthias; Abend, Michael; Valente, Marco; Drouet, Michel; Hérodin, Francis

    2018-07-01

    Use of plasma proteomic and hematological biomarkers represents a promising approach to provide useful diagnostic information for assessment of the severity of hematopoietic acute radiation syndrome. Eighteen baboons were evaluated in a radiation model that underwent total-body and partial-body irradiations at doses of Co gamma rays from 2.5 to 15 Gy at dose rates of 6.25 cGy min and 32 cGy min. Hematopoietic acute radiation syndrome severity levels determined by an analysis of blood count changes measured up to 60 d after irradiation were used to gauge overall hematopoietic acute radiation syndrome severity classifications. A panel of protein biomarkers was measured on plasma samples collected at 0 to 28 d after exposure using electrochemiluminescence-detection technology. The database was split into two distinct groups (i.e., "calibration," n = 11; "validation," n = 7). The calibration database was used in an initial stepwise regression multivariate model-fitting approach followed by down selection of biomarkers for identification of subpanels of hematopoietic acute radiation syndrome-responsive biomarkers for three time windows (i.e., 0-2 d, 2-7 d, 7-28 d). Model 1 (0-2 d) includes log C-reactive protein (p < 0.0001), log interleukin-13 (p < 0.0054), and procalcitonin (p < 0.0316) biomarkers; model 2 (2-7 d) includes log CD27 (p < 0.0001), log FMS-related tyrosine kinase 3 ligand (p < 0.0001), log serum amyloid A (p < 0.0007), and log interleukin-6 (p < 0.0002); and model 3 (7-28 d) includes log CD27 (p < 0.0012), log serum amyloid A (p < 0.0002), log erythropoietin (p < 0.0001), and log CD177 (p < 0.0001). The predicted risk of radiation injury categorization values, representing the hematopoietic acute radiation syndrome severity outcome for the three models, produced least squares multiple regression fit confidences of R = 0.73, 0.82, and 0.75, respectively. The resultant algorithms support the proof of concept that plasma proteomic biomarkers can supplement clinical signs and symptoms to assess hematopoietic acute radiation syndrome risk severity.

  14. HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    PubMed Central

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-01-01

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752

  15. Micro-proteomics with iterative data analysis: Proteome analysis in C. elegans at the single worm level.

    PubMed

    Bensaddek, Dalila; Narayan, Vikram; Nicolas, Armel; Murillo, Alejandro Brenes; Gartner, Anton; Kenyon, Cynthia J; Lamond, Angus I

    2016-02-01

    Proteomics studies typically analyze proteins at a population level, using extracts prepared from tens of thousands to millions of cells. The resulting measurements correspond to average values across the cell population and can mask considerable variation in protein expression and function between individual cells or organisms. Here, we report the development of micro-proteomics for the analysis of Caenorhabditis elegans, a eukaryote composed of 959 somatic cells and ∼1500 germ cells, measuring the worm proteome at a single organism level to a depth of ∼3000 proteins. This includes detection of proteins across a wide dynamic range of expression levels (>6 orders of magnitude), including many chromatin-associated factors involved in chromosome structure and gene regulation. We apply the micro-proteomics workflow to measure the global proteome response to heat-shock in individual nematodes. This shows variation between individual animals in the magnitude of proteome response following heat-shock, including variable induction of heat-shock proteins. The micro-proteomics pipeline thus facilitates the investigation of stochastic variation in protein expression between individuals within an isogenic population of C. elegans. All data described in this study are available online via the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd), an open access, searchable database resource. © 2015 The Authors. PROTEOMICS Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Role for protein–protein interaction databases in human genetics

    PubMed Central

    Pattin, Kristine A; Moore, Jason H

    2010-01-01

    Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610

  17. dbHiMo: a web-based epigenomics platform for histone-modifying enzymes

    PubMed Central

    Choi, Jaeyoung; Kim, Ki-Tae; Huh, Aram; Kwon, Seomun; Hong, Changyoung; Asiegbu, Fred O.; Jeon, Junhyun; Lee, Yong-Hwan

    2015-01-01

    Over the past two decades, epigenetics has evolved into a key concept for understanding regulation of gene expression. Among many epigenetic mechanisms, covalent modifications such as acetylation and methylation of lysine residues on core histones emerged as a major mechanism in epigenetic regulation. Here, we present the database for histone-modifying enzymes (dbHiMo; http://hme.riceblast.snu.ac.kr/) aimed at facilitating functional and comparative analysis of histone-modifying enzymes (HMEs). HMEs were identified by applying a search pipeline built upon profile hidden Markov model (HMM) to proteomes. The database incorporates 11 576 HMEs identified from 603 proteomes including 483 fungal, 32 plants and 51 metazoan species. The dbHiMo provides users with web-based personalized data browsing and analysis tools, supporting comparative and evolutionary genomics. With comprehensive data entries and associated web-based tools, our database will be a valuable resource for future epigenetics/epigenomics studies. Database URL: http://hme.riceblast.snu.ac.kr/ PMID:26055100

  18. Molecular and MALDI-TOF identification of ticks and tick-associated bacteria in Mali

    PubMed Central

    Diarra, Adama Zan; Almeras, Lionel; Berenger, Jean-Michel; Koné, Abdoulaye K.; Bocoum, Zakaria; Dabo, Abdoulaye; Doumbo, Ogobara; Raoult, Didier; Parola, Philippe

    2017-01-01

    Ticks are considered the second vector of human and animal diseases after mosquitoes. Therefore, identification of ticks and associated pathogens is an important step in the management of these vectors. In recent years, Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been reported as a promising method for the identification of arthropods including ticks. The objective of this study was to improve the conditions for the preparation of tick samples for their identification by MALDI-TOF MS from field-collected ethanol-stored Malian samples and to evaluate the capacity of this technology to distinguish infected and uninfected ticks. A total of 1,333 ticks were collected from mammals in three distinct sites from Mali. Morphological identification allowed classification of ticks into 6 species including Amblyomma variegatum, Hyalomma truncatum, Hyalomma marginatum rufipes, Rhipicephalus (Boophilus) microplus, Rhipicephalus evertsi evertsi and Rhipicephalus sanguineus sl. Among those, 471 ticks were randomly selected for molecular and proteomic analyses. Tick legs submitted to MALDI-TOF MS revealed a concordant morpho/molecular identification of 99.6%. The inclusion in our MALDI-TOF MS arthropod database of MS reference spectra from ethanol-preserved tick leg specimens was required to obtain reliable identification. When tested by molecular tools, 76.6%, 37.6%, 20.8% and 1.1% of the specimens tested were positive for Rickettsia spp., Coxiella burnetii, Anaplasmataceae and Borrelia spp., respectively. These results support the fact that MALDI-TOF is a reliable tool for the identification of ticks conserved in alcohol and enhances knowledge about the diversity of tick species and pathogens transmitted by ticks circulating in Mali. PMID:28742123

  19. Molecular and MALDI-TOF identification of ticks and tick-associated bacteria in Mali.

    PubMed

    Diarra, Adama Zan; Almeras, Lionel; Laroche, Maureen; Berenger, Jean-Michel; Koné, Abdoulaye K; Bocoum, Zakaria; Dabo, Abdoulaye; Doumbo, Ogobara; Raoult, Didier; Parola, Philippe

    2017-07-01

    Ticks are considered the second vector of human and animal diseases after mosquitoes. Therefore, identification of ticks and associated pathogens is an important step in the management of these vectors. In recent years, Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been reported as a promising method for the identification of arthropods including ticks. The objective of this study was to improve the conditions for the preparation of tick samples for their identification by MALDI-TOF MS from field-collected ethanol-stored Malian samples and to evaluate the capacity of this technology to distinguish infected and uninfected ticks. A total of 1,333 ticks were collected from mammals in three distinct sites from Mali. Morphological identification allowed classification of ticks into 6 species including Amblyomma variegatum, Hyalomma truncatum, Hyalomma marginatum rufipes, Rhipicephalus (Boophilus) microplus, Rhipicephalus evertsi evertsi and Rhipicephalus sanguineus sl. Among those, 471 ticks were randomly selected for molecular and proteomic analyses. Tick legs submitted to MALDI-TOF MS revealed a concordant morpho/molecular identification of 99.6%. The inclusion in our MALDI-TOF MS arthropod database of MS reference spectra from ethanol-preserved tick leg specimens was required to obtain reliable identification. When tested by molecular tools, 76.6%, 37.6%, 20.8% and 1.1% of the specimens tested were positive for Rickettsia spp., Coxiella burnetii, Anaplasmataceae and Borrelia spp., respectively. These results support the fact that MALDI-TOF is a reliable tool for the identification of ticks conserved in alcohol and enhances knowledge about the diversity of tick species and pathogens transmitted by ticks circulating in Mali.

  20. High Dynamic Range Characterization of the Trauma Patient Plasma Proteome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Tao; Qian, Weijun; Gritsenko, Marina A.

    2006-06-08

    While human plasma represents an attractive sample for disease biomarker discovery, the extreme complexity and large dynamic range in protein concentrations present significant challenges for characterization, candidate biomarker discovery, and validation. Herein, we describe a strategy that combines immunoaffinity subtraction and chemical fractionation based on cysteinyl peptide and N-glycopeptide captures with 2D-LC-MS/MS to increase the dynamic range of analysis for plasma. Application of this ''divide-and-conquer'' strategy to trauma patient plasma significantly improved the overall dynamic range of detection and resulted in confident identification of 22,267 unique peptides from four different peptide populations (cysteinyl peptides, non-cysteinyl peptides, N-glycopeptides, and non-glycopeptides) thatmore » covered 3654 nonredundant proteins. Numerous low-abundance proteins were identified, exemplified by 78 ''classic'' cytokines and cytokine receptors and by 136 human cell differentiation molecules. Additionally, a total of 2910 different N-glycopeptides that correspond to 662 N-glycoproteins and 1553 N-glycosylation sites were identified. A panel of the proteins identified in this study is known to be involved in inflammation and immune responses. This study established an extensive reference protein database for trauma patients, which provides a foundation for future high-throughput quantitative plasma proteomic studies designed to elucidate the mechanisms that underlie systemic inflammatory responses.« less

  1. Detection of Biomarkers of Pathogenic Naegleria fowleri Through Mass Spectrometry and Proteomics

    PubMed Central

    Moura, Hercules; Izquierdo, Fernando; Woolfitt, Adrian R.; Wagner, Glauber; Pinto, Tatiana; del Aguila, Carmen; Barr, John R.

    2017-01-01

    Emerging methods based on mass spectrometry (MS) can be used in the rapid identification of microorganisms. Thus far, these practical and rapidly evolving methods have mainly been applied to characterize prokaryotes. We applied matrix-assisted laser-desorption-ionization-time-of-flight mass spectrometry MALDI-TOF MS in the analysis of whole cells of 18 N. fowleri isolates belonging to three genotypes. Fourteen originated from the cerebrospinal fluid or brain tissue of primary amoebic meningoencephalitis patients and four originated from water samples of hot springs, rivers, lakes or municipal water supplies. Whole Naegleria trophozoites grown in axenic cultures were washed and mixed with MALDI matrix. Mass spectra were acquired with a 4700 TOF-TOF instrument. MALDI-TOF MS yielded consistent patterns for all isolates examined. Using a combination of novel data processing methods for visual peak comparison, statistical analysis and proteomics database searching we were able to detect several biomarkers that can differentiate all species and isolates studied, along with common biomarkers for all N. fowleri isolates. Naegleria fowleri could be easily separated from other species within the genus Naegleria. A number of peaks detected were tentatively identified. MALDI-TOF MS fingerprinting is a rapid, reproducible, high-throughput alternative method for identifying Naegleria isolates. This method has potential for studying eukaryotic agents. PMID:25231600

  2. In-Culture Cross-Linking of Bacterial Cells Reveals Large-Scale Dynamic Protein-Protein Interactions at the Peptide Level.

    PubMed

    de Jong, Luitzen; de Koning, Edward A; Roseboom, Winfried; Buncherd, Hansuk; Wanner, Martin J; Dapic, Irena; Jansen, Petra J; van Maarseveen, Jan H; Corthals, Garry L; Lewis, Peter J; Hamoen, Leendert W; de Koster, Chris G

    2017-07-07

    Identification of dynamic protein-protein interactions at the peptide level on a proteomic scale is a challenging approach that is still in its infancy. We have developed a system to cross-link cells directly in culture with the special lysine cross-linker bis(succinimidyl)-3-azidomethyl-glutarate (BAMG). We used the Gram-positive model bacterium Bacillus subtilis as an exemplar system. Within 5 min extensive intracellular cross-linking was detected, while intracellular cross-linking in a Gram-negative species, Escherichia coli, was still undetectable after 30 min, in agreement with the low permeability in this organism for lipophilic compounds like BAMG. We were able to identify 82 unique interprotein cross-linked peptides with <1% false discovery rate by mass spectrometry and genome-wide database searching. Nearly 60% of the interprotein cross-links occur in assemblies involved in transcription and translation. Several of these interactions are new, and we identified a binding site between the δ and β' subunit of RNA polymerase close to the downstream DNA channel, providing a clue into how δ might regulate promoter selectivity and promote RNA polymerase recycling. Our methodology opens new avenues to investigate the functional dynamic organization of complex protein assemblies involved in bacterial growth. Data are available via ProteomeXchange with identifier PXD006287.

  3. Rapid detection of proteins in transgenic crops without protein reference standards by targeted proteomic mass spectrometry.

    PubMed

    Schacherer, Lindsey J; Xie, Weiping; Owens, Michaela A; Alarcon, Clara; Hu, Tiger X

    2016-09-01

    Liquid chromatography coupled with tandem mass spectrometry is increasingly used for protein detection for transgenic crops research. Currently this is achieved with protein reference standards which may take a significant time or efforts to obtain and there is a need for rapid protein detection without protein reference standards. A sensitive and specific method was developed to detect target proteins in transgenic maize leaf crude extract at concentrations as low as ∼30 ng mg(-1) dry leaf without the need of reference standards or any sample enrichment. A hybrid Q-TRAP mass spectrometer was used to monitor all potential tryptic peptides of the target proteins in both transgenic and non-transgenic samples. The multiple reaction monitoring-initiated detection and sequencing (MIDAS) approach was used for initial peptide/protein identification via Mascot database search. Further confirmation was achieved by direct comparison between transgenic and non-transgenic samples. Definitive confirmation was provided by running the same experiments of synthetic peptides or protein standards, if available. A targeted proteomic mass spectrometry method using MIDAS approach is an ideal methodology for detection of new proteins in early stages of transgenic crop research and development when neither protein reference standards nor antibodies are available. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  4. Towards a Proteomic Catalogue and Differential Annotation of Salivary Gland Proteins in Blood Fed Malaria Vector Anopheles culicifacies by Mass Spectrometry.

    PubMed

    Rawal, Ritu; Vijay, Sonam; Kadian, Kavita; Singh, Jagbir; Pande, Veena; Sharma, Arun

    2016-01-01

    In order to understand the importance of functional proteins in mosquito behavior, following blood meal, a baseline proteomic dataset is essential for providing insights into the physiology of blood feeding. Therefore, in this study as first step, in solution and 1-D electrophoresis digestion approach combined with tandem mass spectrometry (nano LC-MS/MS) and computational bioinformatics for data mining was used to prepare a baseline proteomic catalogue of salivary gland proteins of sugar fed An. culicifacies mosquitoes. A total of 106 proteins were identified and analyzed by SEQUEST algorithm against mosquito protein database from Uniprot/NCBI. Importantly, D7r1, D7r2, D7r4, salivary apyrase, anti-platelet protein, calreticulin, antigen 5 family proteins were identified and grouped on the basis of biological and functional roles. Secondly, differential protein expression and annotations between salivary glands of sugar fed vs blood fed mosquitoes was analyzed using 2-Delectrophoresis combined with MALDI-TOF mass spectrometry. The alterations in the differential expression of total 38 proteins was observed out of which 29 proteins like beclin-1, phosphorylating proteins, heme oxygenase 1, ferritin, apoptotic proteins, coagulation and immunity like, serine proteases, serpins, c-type lectin and protein in regulation of blood feeding behavior were found to be up regulated while 9 proteins related to blood feeding, juvenile hormone epoxide hydrolase ii, odorant binding proteins and energy metabolic enzymes were found to be down regulated. To our knowledge, this study provides a first time baseline proteomic dataset and functional annotations of An. culicifacies salivary gland proteins that may be involved during the blood feeding. Identification of differential salivary proteins between sugar fed and blood fed mosquitoes and their plausible role may provide insights into the physiological processes associated with feeding behavior and sporozoite transmission during the process of blood feeding.

  5. Towards a Proteomic Catalogue and Differential Annotation of Salivary Gland Proteins in Blood Fed Malaria Vector Anopheles culicifacies by Mass Spectrometry

    PubMed Central

    Rawal, Ritu; Vijay, Sonam; Kadian, Kavita; Singh, Jagbir; Pande, Veena; Sharma, Arun

    2016-01-01

    In order to understand the importance of functional proteins in mosquito behavior, following blood meal, a baseline proteomic dataset is essential for providing insights into the physiology of blood feeding. Therefore, in this study as first step, in solution and 1-D electrophoresis digestion approach combined with tandem mass spectrometry (nano LC-MS/MS) and computational bioinformatics for data mining was used to prepare a baseline proteomic catalogue of salivary gland proteins of sugar fed An. culicifacies mosquitoes. A total of 106 proteins were identified and analyzed by SEQUEST algorithm against mosquito protein database from Uniprot/NCBI. Importantly, D7r1, D7r2, D7r4, salivary apyrase, anti-platelet protein, calreticulin, antigen 5 family proteins were identified and grouped on the basis of biological and functional roles. Secondly, differential protein expression and annotations between salivary glands of sugar fed vs blood fed mosquitoes was analyzed using 2-Delectrophoresis combined with MALDI-TOF mass spectrometry. The alterations in the differential expression of total 38 proteins was observed out of which 29 proteins like beclin-1, phosphorylating proteins, heme oxygenase 1, ferritin, apoptotic proteins, coagulation and immunity like, serine proteases, serpins, c-type lectin and protein in regulation of blood feeding behavior were found to be up regulated while 9 proteins related to blood feeding, juvenile hormone epoxide hydrolase ii, odorant binding proteins and energy metabolic enzymes were found to be down regulated. To our knowledge, this study provides a first time baseline proteomic dataset and functional annotations of An. culicifacies salivary gland proteins that may be involved during the blood feeding. Identification of differential salivary proteins between sugar fed and blood fed mosquitoes and their plausible role may provide insights into the physiological processes associated with feeding behavior and sporozoite transmission during the process of blood feeding. PMID:27602567

  6. Wheat proteomics: proteome modulation and abiotic stress acclimation

    PubMed Central

    Komatsu, Setsuko; Kamal, Abu H. M.; Hossain, Zahed

    2014-01-01

    Cellular mechanisms of stress sensing and signaling represent the initial plant responses to adverse conditions. The development of high-throughput “Omics” techniques has initiated a new era of the study of plant molecular strategies for adapting to environmental changes. However, the elucidation of stress adaptation mechanisms in plants requires the accurate isolation and characterization of stress-responsive proteins. Because the functional part of the genome, namely the proteins and their post-translational modifications, are critical for plant stress responses, proteomic studies provide comprehensive information about the fine-tuning of cellular pathways that primarily involved in stress mitigation. This review summarizes the major proteomic findings related to alterations in the wheat proteomic profile in response to abiotic stresses. Moreover, the strengths and weaknesses of different sample preparation techniques, including subcellular protein extraction protocols, are discussed in detail. The continued development of proteomic approaches in combination with rapidly evolving bioinformatics tools and interactive databases will facilitate understanding of the plant mechanisms underlying stress tolerance. PMID:25538718

  7. Identification of a putative protein profile associated with tamoxifen therapy resistance in breast cancer.

    PubMed

    Umar, Arzu; Kang, Hyuk; Timmermans, Annemieke M; Look, Maxime P; Meijer-van Gelder, Marion E; den Bakker, Michael A; Jaitly, Navdeep; Martens, John W M; Luider, Theo M; Foekens, John A; Pasa-Tolić, Ljiljana

    2009-06-01

    Tamoxifen resistance is a major cause of death in patients with recurrent breast cancer. Current clinical factors can correctly predict therapy response in only half of the treated patients. Identification of proteins that are associated with tamoxifen resistance is a first step toward better response prediction and tailored treatment of patients. In the present study we intended to identify putative protein biomarkers indicative of tamoxifen therapy resistance in breast cancer using nano-LC coupled with FTICR MS. Comparative proteome analysis was performed on approximately 5,500 pooled tumor cells (corresponding to approximately 550 ng of protein lysate/analysis) obtained through laser capture microdissection (LCM) from two independently processed data sets (n = 24 and n = 27) containing both tamoxifen therapy-sensitive and therapy-resistant tumors. Peptides and proteins were identified by matching mass and elution time of newly acquired LC-MS features to information in previously generated accurate mass and time tag reference databases. A total of 17,263 unique peptides were identified that corresponded to 2,556 non-redundant proteins identified with > or = 2 peptides. 1,713 overlapping proteins between the two data sets were used for further analysis. Comparative proteome analysis revealed 100 putatively differentially abundant proteins between tamoxifen-sensitive and tamoxifen-resistant tumors. The presence and relative abundance for 47 differentially abundant proteins were verified by targeted nano-LC-MS/MS in a selection of unpooled, non-microdissected discovery set tumor tissue extracts. ENPP1, EIF3E, and GNB4 were significantly associated with progression-free survival upon tamoxifen treatment for recurrent disease. Differential abundance of our top discriminating protein, extracellular matrix metalloproteinase inducer, was validated by tissue microarray in an independent patient cohort (n = 156). Extracellular matrix metalloproteinase inducer levels were higher in therapy-resistant tumors and significantly associated with an earlier tumor progression following first line tamoxifen treatment (hazard ratio, 1.87; 95% confidence interval, 1.25-2.80; p = 0.002). In summary, comparative proteomics performed on laser capture microdissection-derived breast tumor cells using nano-LC-FTICR MS technology revealed a set of putative biomarkers associated with tamoxifen therapy resistance in recurrent breast cancer.

  8. Sialome of a Generalist Lepidopteran Herbivore: Identification of Transcripts and Proteins from Helicoverpa armigera Labial Salivary Glands

    PubMed Central

    Celorio-Mancera, Maria de la Paz; Courtiade, Juliette; Muck, Alexander; Heckel, David G.; Musser, Richard O.; Vogel, Heiko

    2011-01-01

    Although the importance of insect saliva in insect-host plant interactions has been acknowledged, there is very limited information on the nature and complexity of the salivary proteome in lepidopteran herbivores. We inspected the labial salivary transcriptome and proteome of Helicoverpa armigera, an important polyphagous pest species. To identify the majority of the salivary proteins we have randomly sequenced 19,389 expressed sequence tags (ESTs) from a normalized cDNA library of salivary glands. In parallel, a non-cytosolic enriched protein fraction was obtained from labial salivary glands and subjected to two-dimensional gel electrophoresis (2-DE) and de novo peptide sequencing. This procedure allowed comparison of peptides and EST sequences and enabled us to identify 65 protein spots from the secreted labial saliva 2DE proteome. The mass spectrometry analysis revealed ecdysone, glucose oxidase, fructosidase, carboxyl/cholinesterase and an uncharacterized protein previously detected in H. armigera midgut proteome. Consistently, their corresponding transcripts are among the most abundant in our cDNA library. We did find redundancy of sequence identification of saliva-secreted proteins suggesting multiple isoforms. As expected, we found several enzymes responsible for digestion and plant offense. In addition, we identified non-digestive proteins such as an arginine kinase and abundant proteins of unknown function. This identification of secreted salivary gland proteins allows a more comprehensive understanding of insect feeding and poses new challenges for the elucidation of protein function. PMID:22046331

  9. Method optimization for proteomic analysis of soybean leaf: Improvements in identification of new and low-abundance proteins

    PubMed Central

    Mesquita, Rosilene Oliveira; de Almeida Soares, Eduardo; de Barros, Everaldo Gonçalves; Loureiro, Marcelo Ehlers

    2012-01-01

    The most critical step in any proteomic study is protein extraction and sample preparation. Better solubilization increases the separation and resolution of gels, allowing identification of a higher number of proteins and more accurate quantitation of differences in gene expression. Despite the existence of published results for the optimization of proteomic analyses of soybean seeds, no comparable data are available for proteomic studies of soybean leaf tissue. In this work we have tested the effects of modification of a TCA-acetone method on the resolution of 2-DE gels of leaves and roots of soybean. Better focusing was obtained when both mercaptoethanol and dithiothreitol were used in the extraction buffer simultaneously. Increasing the number of washes of TCA precipitated protein with acetone, using a final wash with 80% ethanol and using sonication to ressuspend the pellet increased the number of detected proteins as well the resolution of the 2-DE gels. Using this approach we have constructed a soybean protein map. The major group of identified proteins corresponded to genes of unknown function. The second and third most abundant groups of proteins were composed of photosynthesis and metabolism related genes. The resulting protocol improved protein solubility and gel resolution allowing the identification of 122 soybean leaf proteins, 72 of which were not detected in other published soybean leaf 2-DE gel datasets, including a transcription factor and several signaling proteins. PMID:22802721

  10. Using PSEA-Quant for Protein Set Enrichment Analysis of Quantitative Mass Spectrometry-Based Proteomics

    PubMed Central

    Lavallée-Adam, Mathieu

    2017-01-01

    PSEA-Quant analyzes quantitative mass spectrometry-based proteomics datasets to identify enrichments of annotations contained in repositories such as the Gene Ontology and Molecular Signature databases. It allows users to identify the annotations that are significantly enriched for reproducibly quantified high abundance proteins. PSEA-Quant is available on the web and as a command-line tool. It is compatible with all label-free and isotopic labeling-based quantitative proteomics methods. This protocol describes how to use PSEA-Quant and interpret its output. The importance of each parameter as well as troubleshooting approaches are also discussed. PMID:27010334

  11. LFQProfiler and RNP(xl): Open-Source Tools for Label-Free Quantification and Protein-RNA Cross-Linking Integrated into Proteome Discoverer.

    PubMed

    Veit, Johannes; Sachsenberg, Timo; Chernev, Aleksandar; Aicheler, Fabian; Urlaub, Henning; Kohlbacher, Oliver

    2016-09-02

    Modern mass spectrometry setups used in today's proteomics studies generate vast amounts of raw data, calling for highly efficient data processing and analysis tools. Software for analyzing these data is either monolithic (easy to use, but sometimes too rigid) or workflow-driven (easy to customize, but sometimes complex). Thermo Proteome Discoverer (PD) is a powerful software for workflow-driven data analysis in proteomics which, in our eyes, achieves a good trade-off between flexibility and usability. Here, we present two open-source plugins for PD providing additional functionality: LFQProfiler for label-free quantification of peptides and proteins, and RNP(xl) for UV-induced peptide-RNA cross-linking data analysis. LFQProfiler interacts with existing PD nodes for peptide identification and validation and takes care of the entire quantitative part of the workflow. We show that it performs at least on par with other state-of-the-art software solutions for label-free quantification in a recently published benchmark ( Ramus, C.; J. Proteomics 2016 , 132 , 51 - 62 ). The second workflow, RNP(xl), represents the first software solution to date for identification of peptide-RNA cross-links including automatic localization of the cross-links at amino acid resolution and localization scoring. It comes with a customized integrated cross-link fragment spectrum viewer for convenient manual inspection and validation of the results.

  12. Proteome Speciation by Mass Spectrometry: Characterization of Composite Protein Mixtures in Milk Replacers.

    PubMed

    Gaspari, Marco; Chiesa, Luca; Nicastri, Annalisa; Gabriele, Caterina; Harper, Valeria; Britti, Domenico; Cuda, Giovanni; Procopio, Antonio

    2016-12-06

    The ability of tandem mass spectrometry to determine the primary structure of proteolytic peptides can be exploited to trace back the organisms from which the corresponding proteins were extracted. This information can be important when food products, such as protein powders, can be supplemented with lower-quality starting materials. In order to dissect the origin of proteinaceous material composing a given unknown mixture, a two-step database search strategy for bottom-up nanoscale liquid chromatography-tandem mass spectrometry (nanoLC-MS/MS) data was implemented. A single nanoLC-MS/MS analysis was sufficient not only to determine the qualitative composition of the mixtures under examination, but also to assess the relative percent composition of the various proteomes, if dedicated calibration curves were previously generated. The approach of two-step database search for qualitative analysis and proteome total ion current (pTIC) calculation for quantitative analysis was applied to several binary and ternary mixtures which mimic the composition of milk replacers typically used in calf feeding.

  13. Proteome analysis of bell pepper (Capsicum annuum L.) chromoplasts.

    PubMed

    Siddique, Muhammad Asim; Grossmann, Jonas; Gruissem, Wilhelm; Baginsky, Sacha

    2006-12-01

    We report a comprehensive proteome analysis of chromoplasts from bell pepper (Capsicum annuum L.). The combination of a novel strategy for database-independent detection of proteins from tandem mass spectrometry (MS/MS) data with standard database searches allowed us to identify 151 proteins with a high level of confidence. These include several well-known plastid proteins but also novel proteins that were not previously reported from other plastid proteome studies. The majority of the identified proteins are active in plastid carbohydrate and amino acid metabolism. Among the most abundant individual proteins are capsanthin/capsorubin synthase and fibrillin, which are involved in the synthesis and storage of carotenoids that accumulate to high levels in chromoplasts. The relative abundances of the identified chromoplast proteins differ remarkably compared with their abundances in other plastid types, suggesting a chromoplast-specific metabolic network. Our results provide an overview of the major metabolic pathways active in chromoplasts and extend existing knowledge about prevalent metabolic activities of different plastid types.

  14. Data Portal | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    The CPTAC Data Portal is a centralized repository for the public dissemination of proteomic sequence datasets collected by CPTAC, along with corresponding genomic sequence datasets.  In addition, available are analyses of CPTAC's raw mass spectrometry-based data files (mapping of spectra to peptide sequences and protein identification) by individual investigators from CPTAC and by a Common Data Analysis Pipeline.

  15. NOVEL CONTINUOUS PH/SALT GRADIENT AND PEPTIDE SCORE FOR STRONG CATION EXCHANGE CHROMATOGRAPHY IN 2D-NANO-LC/MSMS PEPTIDE IDENTIFICATION FOR PROTEOMICS

    EPA Science Inventory

    Tryptic digests of human serum albumin (HSA) and human lung epithelial cell lysates were used as test samples in a novel proteomics study. Peptides were separated and analyzed using 2D-nano-LC/MSMS with strong cation exchange (SCX) and reverse phase (RP) chromatography and contin...

  16. Database Search Engines: Paradigms, Challenges and Solutions.

    PubMed

    Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

  17. SenseLab

    PubMed Central

    Crasto, Chiquito J.; Marenco, Luis N.; Liu, Nian; Morse, Thomas M.; Cheung, Kei-Hoi; Lai, Peter C.; Bahl, Gautam; Masiar, Peter; Lam, Hugo Y.K.; Lim, Ernest; Chen, Huajin; Nadkarni, Prakash; Migliore, Michele; Miller, Perry L.; Shepherd, Gordon M.

    2009-01-01

    This article presents the latest developments in neuroscience information dissemination through the SenseLab suite of databases: NeuronDB, CellPropDB, ORDB, OdorDB, OdorMapDB, ModelDB and BrainPharm. These databases include information related to: (i) neuronal membrane properties and neuronal models, and (ii) genetics, genomics, proteomics and imaging studies of the olfactory system. We describe here: the new features for each database, the evolution of SenseLab’s unifying database architecture and instances of SenseLab database interoperation with other neuroscience online resources. PMID:17510162

  18. Toward the Standardization of Mitochondrial Proteomics: The Italian Mitochondrial Human Proteome Project Initiative.

    PubMed

    Alberio, Tiziana; Pieroni, Luisa; Ronci, Maurizio; Banfi, Cristina; Bongarzone, Italia; Bottoni, Patrizia; Brioschi, Maura; Caterino, Marianna; Chinello, Clizia; Cormio, Antonella; Cozzolino, Flora; Cunsolo, Vincenzo; Fontana, Simona; Garavaglia, Barbara; Giusti, Laura; Greco, Viviana; Lucacchini, Antonio; Maffioli, Elisa; Magni, Fulvio; Monteleone, Francesca; Monti, Maria; Monti, Valentina; Musicco, Clara; Petrosillo, Giuseppe; Porcelli, Vito; Saletti, Rosaria; Scatena, Roberto; Soggiu, Alessio; Tedeschi, Gabriella; Zilocchi, Mara; Roncada, Paola; Urbani, Andrea; Fasano, Mauro

    2017-12-01

    The Mitochondrial Human Proteome Project aims at understanding the function of the mitochondrial proteome and its crosstalk with the proteome of other organelles. Being able to choose a suitable and validated enrichment protocol of functional mitochondria, based on the specific needs of the downstream proteomics analysis, would greatly help the researchers in the field. Mitochondrial fractions from ten model cell lines were prepared using three enrichment protocols and analyzed on seven different LC-MS/MS platforms. All data were processed using neXtProt as reference database. The data are available for the Human Proteome Project purposes through the ProteomeXchange Consortium with the identifier PXD007053. The processed data sets were analyzed using a suite of R routines to perform a statistical analysis and to retrieve subcellular and submitochondrial localizations. Although the overall number of identified total and mitochondrial proteins was not significantly dependent on the enrichment protocol, specific line to line differences were observed. Moreover, the protein lists were mapped to a network representing the functional mitochondrial proteome, encompassing mitochondrial proteins and their first interactors. More than 80% of the identified proteins resulted in nodes of this network but with a different ability in coisolating mitochondria-associated structures for each enrichment protocol/cell line pair.

  19. Curated protein information in the Saccharomyces genome database.

    PubMed

    Hellerstedt, Sage T; Nash, Robert S; Weng, Shuai; Paskov, Kelley M; Wong, Edith D; Karra, Kalpana; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. : www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  20. A Comparative Quantitative Proteomic Study Identifies New Proteins Relevant for Sulfur Oxidation in the Purple Sulfur Bacterium Allochromatium vinosum

    PubMed Central

    Weissgerber, Thomas; Sylvester, Marc; Kröninger, Lena

    2014-01-01

    In the present study, we compared the proteome response of Allochromatium vinosum when growing photoautotrophically in the presence of sulfide, thiosulfate, and elemental sulfur with the proteome response when the organism was growing photoheterotrophically on malate. Applying tandem mass tag analysis as well as two-dimensional (2D) PAGE, we detected 1,955 of the 3,302 predicted proteins by identification of at least two peptides (59.2%) and quantified 1,848 of the identified proteins. Altered relative protein amounts (≥1.5-fold) were observed for 385 proteins, corresponding to 20.8% of the quantified A. vinosum proteome. A significant number of the proteins exhibiting strongly enhanced relative protein levels in the presence of reduced sulfur compounds are well documented essential players during oxidative sulfur metabolism, e.g., the dissimilatory sulfite reductase DsrAB. Changes in protein levels generally matched those observed for the respective relative mRNA levels in a previous study and allowed identification of new genes/proteins participating in oxidative sulfur metabolism. One gene cluster (hyd; Alvin_2036-Alvin_2040) and one hypothetical protein (Alvin_2107) exhibiting strong responses on both the transcriptome and proteome levels were chosen for gene inactivation and phenotypic analyses of the respective mutant strains, which verified the importance of the so-called Isp hydrogenase supercomplex for efficient oxidation of sulfide and a crucial role of Alvin_2107 for the oxidation of sulfur stored in sulfur globules to sulfite. In addition, we analyzed the sulfur globule proteome and identified a new sulfur globule protein (SgpD; Alvin_2515). PMID:24487535

  1. Subnanogram proteomics: impact of LC column selection, MS instrumentation and data analysis strategy on proteome coverage for trace samples

    DOE PAGES

    Zhu, Ying; Zhao, Rui; Piehowski, Paul D.; ...

    2017-09-01

    One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here in this study, we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-μm-i.d. columns increase signal intensity by >3-fold relative to those using 75-μm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos MS significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap),more » leading to a ~3-fold increase in peptide identifications and 1.7-fold increase in identified protein groups for 2 ng tryptic digests of the bacterium S. oneidensis. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~95% for 0.5 ng samples and by ~42% for 2 ng samples. Using the best combination of the above variables, we were able to identify >3,000 proteins from 10 ng tryptic digests from both HeLa and THP-1 mammalian cell lines. We also identified >950 proteins from subnanogram archaeal/bacterial cocultures. Finally, the present ultrasensitive LC-MS platform achieves a level of proteome coverage not previously realized for ultra-small sample loadings, and is expected to facilitate the analysis of subnanogram samples, including single mammalian cells.« less

  2. Nuclear proteome analysis of undifferentiated mouse embryonic stem and germ cells.

    PubMed

    Buhr, Nicolas; Carapito, Christine; Schaeffer, Christine; Kieffer, Emmanuelle; Van Dorsselaer, Alain; Viville, Stéphane

    2008-06-01

    Embryonic stem cells (ESCs) and embryonic germ cells (EGCs) provide exciting models for understanding the underlying mechanisms that make a cell pluripotent. Indeed, such understanding would enable dedifferentiation and reprogrammation of any cell type from a patient needing a cell therapy treatment. Proteome analysis has emerged as an important technology for deciphering these biological processes and thereby ESC and EGC proteomes are increasingly studied. Nevertheless, their nuclear proteomes have only been poorly investigated up to now. In order to investigate signaling pathways potentially involved in pluripotency, proteomic analyses have been performed on mouse ESC and EGC nuclear proteins. Nuclei from ESCs and EGCs at undifferentiated stage were purified by subcellular fractionation. After 2-D separation, a subtractive strategy (subtracting culture environment contaminating spots) was applied and a comparison of ESC, (8.5 day post coïtum (dpc))-EGC and (11.5 dpc)-EGC specific nuclear proteomes was performed. A total of 33 ESC, 53 (8.5 dpc)-EGC, and 36 (11.5 dpc)-EGC spots were identified by MALDI-TOF-MS and/or nano-LC-MS/MS. This approach led to the identification of two isoforms (with and without N-terminal acetylation) of a known pluripotency marker, namely developmental pluripotency associated 5 (DPPA5), which has never been identified before in 2-D gel-MS studies of ESCs and EGCs. Furthermore, we demonstrated the efficiency of our subtracting strategy, in association with a nuclear subfractionation by the identification of a new protein (protein arginine N-methyltransferase 7; PRMT7) behaving as proteins involved in pluripotency.

  3. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609

  4. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.

  5. From a 2DE-gel spot to protein function: lesson learned from HS1 in chronic lymphocytic leukemia.

    PubMed

    Apollonio, Benedetta; Bertilaccio, Maria Teresa Sabrina; Restuccia, Umberto; Ranghetti, Pamela; Barbaglio, Federica; Ghia, Paolo; Caligaris-Cappio, Federico; Scielzo, Cristina

    2014-10-19

    The identification of molecules involved in tumor initiation and progression is fundamental for understanding disease's biology and, as a consequence, for the clinical management of patients. In the present work we will describe an optimized proteomic approach for the identification of molecules involved in the progression of Chronic Lymphocytic Leukemia (CLL). In detail, leukemic cell lysates are resolved by 2-dimensional Electrophoresis (2DE) and visualized as "spots" on the 2DE gels. Comparative analysis of proteomic maps allows the identification of differentially expressed proteins (in terms of abundance and post-translational modifications) that are picked, isolated and identified by Mass Spectrometry (MS). The biological function of the identified candidates can be tested by different assays (i.e. migration, adhesion and F-actin polymerization), that we have optimized for primary leukemic cells.

  6. Combining results of multiple search engines in proteomics.

    PubMed

    Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L; Deutsch, Eric W

    2013-09-01

    A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.

  7. Combining Results of Multiple Search Engines in Proteomics*

    PubMed Central

    Shteynberg, David; Nesvizhskii, Alexey I.; Moritz, Robert L.; Deutsch, Eric W.

    2013-01-01

    A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques. PMID:23720762

  8. Parasites, proteomes and systems: has Descartes' clock run out of time?

    PubMed

    Wastling, J M; Armstrong, S D; Krishna, R; Xia, D

    2012-08-01

    Systems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.

  9. Parasites, proteomes and systems: has Descartes’ clock run out of time?

    PubMed Central

    WASTLING, J. M.; ARMSTRONG, S. D.; KRISHNA, R.; XIA, D.

    2012-01-01

    SUMMARY Systems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types. PMID:22828391

  10. HUPO BPP pilot study: a proteomics analysis of the mouse brain of different developmental stages.

    PubMed

    Wang, Jing; Gu, Yong; Wang, Lihong; Hang, Xingyi; Gao, Yan; Wang, Hangyan; Zhang, Chenggang

    2007-11-01

    This study is a part of the HUPO Brain Proteome Project (BPP) pilot study, which aims at obtaining a reliable database of mouse brain proteome, at the comparison of techniques, laboratories, and approaches as well as at preparing subsequent proteome studies of neurologic diseases. The C57/Bl6 mouse brains of three developmental stages at embryonic day 16 (E16), postnatal day 7 (P7), and 8 wk (P56) (n = 5 in each group) were provided by the HUPO BPP executive committee. The whole brain proteins of each animal were individually prepared using 2-DE coupled with PDQuest software analysis. The protein spots representing developmentally related or stably expressed proteins were then prepared with in-gel digestion followed with MALDI-TOF/TOF MS/MS and analyzed using the MASCOT search engines to search the Swiss-Prot or NCBInr database. The 2-DE gel maps of the mouse brains of all of the developmental stages were obtained and submitted to the Data Collection Centre (DCC). The proteins alpha-enolase, stathmin, actin, C14orf166 homolog, 28,000 kDa heat- and acid-stable phosphoprotein, 3-mercaptopyruvate sulfurtransferase and 40 S ribosomal protein S3a were successfully identified. A further Western blotting analysis demonstrated that enolase is a protein up-regulated in the mouse brain from embryonic stage to adult stage. These data are helpful for understanding the proteome changes in the development of the mouse brain.

  11. Proteome-wide Subcellular Topologies of E. coli Polypeptides Database (STEPdb)*

    PubMed Central

    Orfanoudaki, Georgia; Economou, Anastassios

    2014-01-01

    Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb. PMID:25210196

  12. The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative

    PubMed Central

    Gibson, Frank; Hoogland, Christine; Martinez-Bartolomé, Salvador; Medina-Aunon, J. Alberto; Albar, Juan Pablo; Babnigg, Gyorgy; Wipat, Anil; Hermjakob, Henning; Almeida, Jonas S; Stanislaus, Romesh; Paton, Norman W; Jones, Andrew R

    2011-01-01

    The Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI) has developed the GelML data exchange format for representing gel electrophoresis experiments performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Minimum Information About a Proteomics Experiment (MIAPE) set of modules. GelML supports the capture of metadata (such as experimental protocols) and data (such as gel images) resulting from gel electrophoresis so that laboratories can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of experimental processes, and complements other PSI formats for mass spectrometry data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardisation process of PSI consisting of both public consultation and anonymous review of the specifications. PMID:20677327

  13. Large-Scale and Deep Quantitative Proteome Profiling Using Isobaric Labeling Coupled with Two-Dimensional LC-MS/MS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gritsenko, Marina A.; Xu, Zhe; Liu, Tao

    Comprehensive, quantitative information on abundances of proteins and their post-translational modifications (PTMs) can potentially provide novel biological insights into diseases pathogenesis and therapeutic intervention. Herein, we introduce a quantitative strategy utilizing isobaric stable isotope-labelling techniques combined with two-dimensional liquid chromatography-tandem mass spectrometry (2D-LC-MS/MS) for large-scale, deep quantitative proteome profiling of biological samples or clinical specimens such as tumor tissues. The workflow includes isobaric labeling of tryptic peptides for multiplexed and accurate quantitative analysis, basic reversed-phase LC fractionation and concatenation for reduced sample complexity, and nano-LC coupled to high resolution and high mass accuracy MS analysis for high confidence identification andmore » quantification of proteins. This proteomic analysis strategy has been successfully applied for in-depth quantitative proteomic analysis of tumor samples, and can also be used for integrated proteome and PTM characterization, as well as comprehensive quantitative proteomic analysis across samples from large clinical cohorts.« less

  14. A proteomic approach to obesity and type 2 diabetes

    PubMed Central

    López-Villar, Elena; Martos-Moreno, Gabriel Á; Chowen, Julie A; Okada, Shigeru; Kopchick, John J; Argente, Jesús

    2015-01-01

    The incidence of obesity and type diabetes 2 has increased dramatically resulting in an increased interest in its biomedical relevance. However, the mechanisms that trigger the development of diabetes type 2 in obese patients remain largely unknown. Scientific, clinical and pharmaceutical communities are dedicating vast resources to unravel this issue by applying different omics tools. During the last decade, the advances in proteomic approaches and the Human Proteome Organization have opened and are opening a new door that may be helpful in the identification of patients at risk and to improve current therapies. Here, we briefly review some of the advances in our understanding of type 2 diabetes that have occurred through the application of proteomics. We also review, in detail, the current improvements in proteomic methodologies and new strategies that could be employed to further advance our understanding of this pathology. By applying these new proteomic advances, novel therapeutic and/or diagnostic protein targets will be discovered in the obesity/Type 2 diabetes area. PMID:25960181

  15. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics.

    PubMed

    Röst, Hannes L; Liu, Yansheng; D'Agostino, Giuseppe; Zanella, Matteo; Navarro, Pedro; Rosenberger, George; Collins, Ben C; Gillet, Ludovic; Testa, Giuseppe; Malmström, Lars; Aebersold, Ruedi

    2016-09-01

    Next-generation mass spectrometric (MS) techniques such as SWATH-MS have substantially increased the throughput and reproducibility of proteomic analysis, but ensuring consistent quantification of thousands of peptide analytes across multiple liquid chromatography-tandem MS (LC-MS/MS) runs remains a challenging and laborious manual process. To produce highly consistent and quantitatively accurate proteomics data matrices in an automated fashion, we developed TRIC (http://proteomics.ethz.ch/tric/), a software tool that utilizes fragment-ion data to perform cross-run alignment, consistent peak-picking and quantification for high-throughput targeted proteomics. TRIC reduced the identification error compared to a state-of-the-art SWATH-MS analysis without alignment by more than threefold at constant recall while correcting for highly nonlinear chromatographic effects. On a pulsed-SILAC experiment performed on human induced pluripotent stem cells, TRIC was able to automatically align and quantify thousands of light and heavy isotopic peak groups. Thus, TRIC fills a gap in the pipeline for automated analysis of massively parallel targeted proteomics data sets.

  16. Systematic Comparison of Label-Free, Metabolic Labeling, and Isobaric Chemical Labeling for Quantitative Proteomics on LTQ Orbitrap Velos

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Zhou; Adams, Rachel M; Chourey, Karuna

    2012-01-01

    A variety of quantitative proteomics methods have been developed, including label-free, metabolic labeling, and isobaric chemical labeling using iTRAQ or TMT. Here, these methods were compared in terms of the depth of proteome coverage, quantification accuracy, precision, and reproducibility using a high-performance hybrid mass spectrometer, LTQ Orbitrap Velos. Our results show that (1) the spectral counting method provides the deepest proteome coverage for identification, but its quantification performance is worse than labeling-based approaches, especially the quantification reproducibility; (2) metabolic labeling and isobaric chemical labeling are capable of accurate, precise, and reproducible quantification and provide deep proteome coverage for quantification. Isobaricmore » chemical labeling surpasses metabolic labeling in terms of quantification precision and reproducibility; (3) iTRAQ and TMT perform similarly in all aspects compared in the current study using a CID-HCD dual scan configuration. Based on the unique advantages of each method, we provide guidance for selection of the appropriate method for a quantitative proteomics study.« less

  17. Large-Scale and Deep Quantitative Proteome Profiling Using Isobaric Labeling Coupled with Two-Dimensional LC-MS/MS.

    PubMed

    Gritsenko, Marina A; Xu, Zhe; Liu, Tao; Smith, Richard D

    2016-01-01

    Comprehensive, quantitative information on abundances of proteins and their posttranslational modifications (PTMs) can potentially provide novel biological insights into diseases pathogenesis and therapeutic intervention. Herein, we introduce a quantitative strategy utilizing isobaric stable isotope-labeling techniques combined with two-dimensional liquid chromatography-tandem mass spectrometry (2D-LC-MS/MS) for large-scale, deep quantitative proteome profiling of biological samples or clinical specimens such as tumor tissues. The workflow includes isobaric labeling of tryptic peptides for multiplexed and accurate quantitative analysis, basic reversed-phase LC fractionation and concatenation for reduced sample complexity, and nano-LC coupled to high resolution and high mass accuracy MS analysis for high confidence identification and quantification of proteins. This proteomic analysis strategy has been successfully applied for in-depth quantitative proteomic analysis of tumor samples and can also be used for integrated proteome and PTM characterization, as well as comprehensive quantitative proteomic analysis across samples from large clinical cohorts.

  18. The gel electrophoresis markup language (GelML) from the Proteomics Standards Initiative.

    PubMed

    Gibson, Frank; Hoogland, Christine; Martinez-Bartolomé, Salvador; Medina-Aunon, J Alberto; Albar, Juan Pablo; Babnigg, Gyorgy; Wipat, Anil; Hermjakob, Henning; Almeida, Jonas S; Stanislaus, Romesh; Paton, Norman W; Jones, Andrew R

    2010-09-01

    The Human Proteome Organisation's Proteomics Standards Initiative has developed the GelML (gel electrophoresis markup language) data exchange format for representing gel electrophoresis experiments performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Minimum Information About a Proteomics Experiment (MIAPE) set of modules. GelML supports the capture of metadata (such as experimental protocols) and data (such as gel images) resulting from gel electrophoresis so that laboratories can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of experimental processes, and complements other PSI formats for MS data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardisation process of PSI consisting of both public consultation and anonymous review of the specifications.

  19. The chordate proteome history database.

    PubMed

    Levasseur, Anthony; Paganini, Julien; Dainat, Jacques; Thompson, Julie D; Poch, Olivier; Pontarotti, Pierre; Gouret, Philippe

    2012-01-01

    The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.

  20. Beyond the proteome: Mass Spectrometry Special Interest Group (MS-SIG) at ISMB/ECCB 2013

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryu, Soyoung; Payne, Samuel H.; Schaab, Christoph

    2014-07-02

    Mass spectrometry special interest group (MS-SIG) aims to bring together experts from the global research community to discuss highlights and challenges in the field of mass spectrometry (MS)-based proteomics and computational biology. The rapid echnological developments in MS-based proteomics have enabled the generation of a large amount of meaningful information on hundreds to thousands of proteins simultaneously from a biological sample; however, the complexity of the MS data require sophisticated computational algorithms and software for data analysis and interpretation. This year’s MS-SIG meeting theme was ‘Beyond the Proteome’ with major focuses on improving protein identification/quantification and using proteomics data tomore » solve interesting problems in systems biology and clinical research.« less

Top