PAnalyzer: a software tool for protein inference in shotgun proteomics.
Prieto, Gorka; Aloria, Kerman; Osinalde, Nerea; Fullaondo, Asier; Arizmendi, Jesus M; Matthiesen, Rune
2012-11-05
Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool.
PAnalyzer: A software tool for protein inference in shotgun proteomics
2012-01-01
Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool. PMID:23126499
Specific identification of Bacillus anthracis strains
NASA Astrophysics Data System (ADS)
Krishnamurthy, Thaiya; Deshpande, Samir; Hewel, Johannes; Liu, Hongbin; Wick, Charles H.; Yates, John R., III
2007-01-01
Accurate identification of human pathogens is the initial vital step in treating the civilian terrorism victims and military personnel afflicted in biological threat situations. We have applied a powerful multi-dimensional protein identification technology (MudPIT) along with newly generated software termed Profiler to identify the sequences of specific proteins observed for few strains of Bacillus anthracis, a human pathogen. Software termed Profiler was created to initially screen the MudPIT data of B. anthracis strains and establish the observed proteins specific for its strains. A database was also generated using Profiler containing marker proteins of B. anthracis and its strains, which in turn could be used for detecting the organism and its corresponding strains in samples. Analysis of the unknowns by our methodology, combining MudPIT and Profiler, led to the accurate identification of the anthracis strains present in samples. Thus, a new approach for the identification of B. anthracis strains in unknown samples, based on the molecular mass and sequences of marker proteins, has been ascertained.
Computer applications making rapid advances in high throughput microbial proteomics (HTMP).
Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen
2014-02-01
The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.
Spencer, Jean L; Bhatia, Vivek N; Whelan, Stephen A; Costello, Catherine E; McComb, Mark E
2013-12-01
The identification of protein post-translational modifications (PTMs) is an increasingly important component of proteomics and biomarker discovery, but very few tools exist for performing fast and easy characterization of global PTM changes and differential comparison of PTMs across groups of data obtained from liquid chromatography-tandem mass spectrometry experiments. STRAP PTM (Software Tool for Rapid Annotation of Proteins: Post-Translational Modification edition) is a program that was developed to facilitate the characterization of PTMs using spectral counting and a novel scoring algorithm to accelerate the identification of differential PTMs from complex data sets. The software facilitates multi-sample comparison by collating, scoring, and ranking PTMs and by summarizing data visually. The freely available software (beta release) installs on a PC and processes data in protXML format obtained from files parsed through the Trans-Proteomic Pipeline. The easy-to-use interface allows examination of results at protein, peptide, and PTM levels, and the overall design offers tremendous flexibility that provides proteomics insight beyond simple assignment and counting.
FunRich proteomics software analysis, let the fun begin!
Benito-Martin, Alberto; Peinado, Héctor
2015-08-01
Protein MS analysis is the preferred method for unbiased protein identification. It is normally applied to a large number of both small-scale and high-throughput studies. However, user-friendly computational tools for protein analysis are still needed. In this issue, Mathivanan and colleagues (Proteomics 2015, 15, 2597-2601) report the development of FunRich software, an open-access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A multi-center study benchmarks software tools for label-free proteome quantification
Gillet, Ludovic C; Bernhardt, Oliver M.; MacLean, Brendan; Röst, Hannes L.; Tate, Stephen A.; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I.; Aebersold, Ruedi; Tenzer, Stefan
2016-01-01
The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation windows setups. For consistent evaluation we developed LFQbench, an R-package to calculate metrics of precision and accuracy in label-free quantitative MS, and report the identification performance, robustness and specificity of each software tool. Our reference datasets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics. PMID:27701404
A multicenter study benchmarks software tools for label-free proteome quantification.
Navarro, Pedro; Kuharev, Jörg; Gillet, Ludovic C; Bernhardt, Oliver M; MacLean, Brendan; Röst, Hannes L; Tate, Stephen A; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I; Aebersold, Ruedi; Tenzer, Stefan
2016-11-01
Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Hrdlickova Kuckova, Stepanka; Rambouskova, Gabriela; Hynek, Radovan; Cejnar, Pavel; Oltrogge, Doris; Fuchs, Robert
2015-11-01
Matrix-assisted laser desorption/ionisation-time of flight (MALDI-TOF) mass spectrometry is commonly used for the identification of proteinaceous binders and their mixtures in artworks. The determination of protein binders is based on a comparison between the m/z values of tryptic peptides in the unknown sample and a reference one (egg, casein, animal glues etc.), but this method has greater potential to study changes due to ageing and the influence of organic/inorganic components on protein identification. However, it is necessary to then carry out statistical evaluation on the obtained data. Before now, it has been complicated to routinely convert the mass spectrometric data into a statistical programme, to extract and match the appropriate peaks. Only several 'homemade' computer programmes without user-friendly interfaces are available for these purposes. In this paper, we would like to present our completely new, publically available, non-commercial software, ms-alone and multiMS-toolbox, for principal component analyses of MALDI-TOF MS data for R software, and their application to the study of the influence of heterogeneous matrices (organic lakes) for protein identification. Using this new software, we determined the main factors that influence the protein analyses of artificially aged model mixtures of organic lakes and fish glue, prepared according to historical recipes that were used for book illumination, using MALDI-TOF peptide mass mapping. Copyright © 2015 John Wiley & Sons, Ltd.
Shi, Xu; Barnes, Robert O; Chen, Li; Shajahan-Haq, Ayesha N; Hilakivi-Clarke, Leena; Clarke, Robert; Wang, Yue; Xuan, Jianhua
2015-07-15
Identification of protein interaction subnetworks is an important step to help us understand complex molecular mechanisms in cancer. In this paper, we develop a BMRF-Net package, implemented in Java and C++, to identify protein interaction subnetworks based on a bagging Markov random field (BMRF) framework. By integrating gene expression data and protein-protein interaction data, this software tool can be used to identify biologically meaningful subnetworks. A user friendly graphic user interface is developed as a Cytoscape plugin for the BMRF-Net software to deal with the input/output interface. The detailed structure of the identified networks can be visualized in Cytoscape conveniently. The BMRF-Net package has been applied to breast cancer data to identify significant subnetworks related to breast cancer recurrence. The BMRF-Net package is available at http://sourceforge.net/projects/bmrfcjava/. The package is tested under Ubuntu 12.04 (64-bit), Java 7, glibc 2.15 and Cytoscape 3.1.0. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A standardized framing for reporting protein identifications in mzIdentML 1.2
Seymour, Sean L.; Farrah, Terry; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John S.; Searle, Brian C.; Tabb, David L.; Vizcaíno, Juan Antonio; Prieto, Gorka; Uszkoreit, Julian; Eisenacher, Martin; Martínez-Bartolomé, Salvador; Ghali, Fawaz; Jones, Andrew R.
2015-01-01
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories like the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software. PMID:25092112
Structure-sequence based analysis for identification of conserved regions in proteins
Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth
2013-05-28
Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.
Wang, Jun; Chen, Wen Feng; Li, Qing X
2012-02-24
The need of quick diagnostics and increasing number of bacterial species isolated necessitate development of a rapid and effective phenotypic identification method. Mass spectrometry (MS) profiling of whole cell proteins has potential to satisfy the requirements. The genus Mycobacterium contains more than 154 species that are taxonomically very close and require use of multiple genes including 16S rDNA for phylogenetic identification and classification. Six strains of five Mycobacterium species were selected as model bacteria in the present study because of their 16S rDNA similarity (98.4-99.8%) and the high similarity of the concatenated 16S rDNA, rpoB and hsp65 gene sequences (95.9-99.9%), requiring high identification resolution. The classification of the six strains by MALDI TOF MS protein barcodes was consistent with, but at much higher resolution than, that of the multi-locus sequence analysis of using 16S rDNA, rpoB and hsp65. The species were well differentiated using MALDI TOF MS and MALDI BioTyper™ software after quick preparation of whole-cell proteins. Several proteins were selected as diagnostic markers for species confirmation. An integration of MALDI TOF MS, MALDI BioTyper™ software and diagnostic protein fragments provides a robust phenotypic approach for bacterial identification and classification. Copyright © 2011 Elsevier B.V. All rights reserved.
MASH Suite Pro: A Comprehensive Software Tool for Top-Down Proteomics*
Cai, Wenxuan; Guner, Huseyin; Gregorich, Zachery R.; Chen, Albert J.; Ayaz-Guner, Serife; Peng, Ying; Valeja, Santosh G.; Liu, Xiaowen; Ge, Ying
2016-01-01
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics. PMID:26598644
Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret
2007-11-01
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences.
Malhis, Nawar; Jacobson, Matthew; Gsponer, Jörg
2016-07-08
Molecular recognition features, MoRFs, are short segments within longer disordered protein regions that bind to globular protein domains in a process known as disorder-to-order transition. MoRFs have been found to play a significant role in signaling and regulatory processes in cells. High-confidence computational identification of MoRFs remains an important challenge. In this work, we introduce MoRFchibi SYSTEM that contains three MoRF predictors: MoRFCHiBi, a basic predictor best suited as a component in other applications, MoRFCHiBi_ Light, ideal for high-throughput predictions and MoRFCHiBi_ Web, slower than the other two but best for high accuracy predictions. Results show that MoRFchibi SYSTEM provides more than double the precision of other predictors. MoRFchibi SYSTEM is available in three different forms: as HTML web server, RESTful web server and downloadable software at: http://www.chibi.ubc.ca/faculty/joerg-gsponer/gsponer-lab/software/morf_chibi/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
DIGE Analysis Software and Protein Identification Approaches.
Hmmier, Abduladim; Dowling, Paul
2018-01-01
DIGE is a high-resolution two-dimensional gel electrophoresis method, with excellent dynamic range obtained by fluorescent tag labeling of protein samples. Scanned images of DIGE gels show thousands of protein spots, each spot representing a single or a group of protein isoforms. By using commercially available software, each protein spot is defined by an outline, which is digitized and correlated with the quantity of proteins present in each spot. Software packages include DeCyder, SameSpots, and Dymension 3. In addition, proteins of interest can be excised from post-stained gels and identified with conventional mass spectrometry techniques. High-throughput mass spectrometry is performed using sophisticated instrumentation including matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF), MALDI-TOF/TOF, and liquid chromatography tandem mass spectrometry (LC-MS/MS). Tandem MS (MALDI-TOF/TOF or LC-MS/MS), analyzes fragmented peptides, resulting in amino acid sequence information, especially useful when protein spots are low abundant or where a mixture of proteins is present.
Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins
2013-01-01
Background Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only. Results In this paper, we propose a preprocessing algorithm that can be used for grouping technical aliases in mass spectrometry protein profiling data. The stringency of the variance allowed for clustering is customizable, thereby affecting the number of peaks that are clustered. Subsequent analysis of the clusters, instead of individual peaks, helps reduce difficulties associated with technically-correlated data, and can aid more efficient biomarker identification. Conclusions This software can be used to pre-process and thereby decrease the complexity of protein profiling proteomics data, thus simplifying the subsequent analysis of biomarkers by decreasing the number of tests. The software is also a practical tool for identifying which features to investigate further by purification, identification and confirmation. PMID:24010718
Levander, Fredrik; James, Peter
2005-01-01
The identification of proteins separated on two-dimensional gels is most commonly performed by trypsin digestion and subsequent matrix-assisted laser desorption ionization (MALDI) with time-of-flight (TOF). Recently, atmospheric pressure (AP) MALDI coupled to an ion trap (IT) has emerged as a convenient method to obtain tandem mass spectra (MS/MS) from samples on MALDI target plates. In the present work, we investigated the feasibility of using the two methodologies in line as a standard method for protein identification. In this setup, the high mass accuracy MALDI-TOF spectra are used to calibrate the peptide precursor masses in the lower mass accuracy AP-MALDI-IT MS/MS spectra. Several software tools were developed to automate the analysis process. Two sets of MALDI samples, consisting of 142 and 421 gel spots, respectively, were analyzed in a highly automated manner. In the first set, the protein identification rate increased from 61% for MALDI-TOF only to 85% for MALDI-TOF combined with AP-MALDI-IT. In the second data set the increase in protein identification rate was from 44% to 58%. AP-MALDI-IT MS/MS spectra were in general less effective than the MALDI-TOF spectra for protein identification, but the combination of the two methods clearly enhanced the confidence in protein identification.
Peptide Peak Detection for Low Resolution MALDI-TOF Mass Spectrometry.
Yao, Jingwen; Utsunomiya, Shin-Ichi; Kajihara, Shigeki; Tabata, Tsuyoshi; Aoshima, Ken; Oda, Yoshiya; Tanaka, Koichi
2014-01-01
A new peak detection method has been developed for rapid selection of peptide and its fragment ion peaks for protein identification using tandem mass spectrometry. The algorithm applies classification of peak intensities present in the defined mass range to determine the noise level. A threshold is then given to select ion peaks according to the determined noise level in each mass range. This algorithm was initially designed for the peak detection of low resolution peptide mass spectra, such as matrix-assisted laser desorption/ionization Time-of-Flight (MALDI-TOF) mass spectra. But it can also be applied to other type of mass spectra. This method has demonstrated obtaining a good rate of number of real ions to noises for even poorly fragmented peptide spectra. The effect of using peak lists generated from this method produces improved protein scores in database search results. The reliability of the protein identifications is increased by finding more peptide identifications. This software tool is freely available at the Mass++ home page (http://www.first-ms3d.jp/english/achievement/software/).
Peptide Peak Detection for Low Resolution MALDI-TOF Mass Spectrometry
Yao, Jingwen; Utsunomiya, Shin-ichi; Kajihara, Shigeki; Tabata, Tsuyoshi; Aoshima, Ken; Oda, Yoshiya; Tanaka, Koichi
2014-01-01
A new peak detection method has been developed for rapid selection of peptide and its fragment ion peaks for protein identification using tandem mass spectrometry. The algorithm applies classification of peak intensities present in the defined mass range to determine the noise level. A threshold is then given to select ion peaks according to the determined noise level in each mass range. This algorithm was initially designed for the peak detection of low resolution peptide mass spectra, such as matrix-assisted laser desorption/ionization Time-of-Flight (MALDI-TOF) mass spectra. But it can also be applied to other type of mass spectra. This method has demonstrated obtaining a good rate of number of real ions to noises for even poorly fragmented peptide spectra. The effect of using peak lists generated from this method produces improved protein scores in database search results. The reliability of the protein identifications is increased by finding more peptide identifications. This software tool is freely available at the Mass++ home page (http://www.first-ms3d.jp/english/achievement/software/). PMID:26819872
Murugaiyan, Jayaseelan; Eravci, Murat; Weise, Christoph; Roesler, Uwe
2017-06-01
Here, we provide the dataset associated with our research article 'label-free quantitative proteomic analysis of harmless and pathogenic strains of infectious microalgae, Prototheca spp.' (Murugaiyan et al., 2017) [1]. This dataset describes liquid chromatography-mass spectrometry (LC-MS)-based protein identification and quantification of a non-infectious strain, Prototheca zopfii genotype 1 and two strains associated with severe and mild infections, respectively, P. zopfii genotype 2 and Prototheca blaschkeae . Protein identification and label-free quantification was carried out by analysing MS raw data using the MaxQuant-Andromeda software suit. The expressional level differences of the identified proteins among the strains were computed using Perseus software and the results were presented in [1]. This DiB provides the MaxQuant output file and raw data deposited in the PRIDE repository with the dataset identifier PXD005305.
Prediction of Protein Configurational Entropy (Popcoen).
Goethe, Martin; Gleixner, Jan; Fita, Ignacio; Rubi, J Miguel
2018-03-13
A knowledge-based method for configurational entropy prediction of proteins is presented; this methodology is extremely fast, compared to previous approaches, because it does not involve any type of configurational sampling. Instead, the configurational entropy of a query fold is estimated by evaluating an artificial neural network, which was trained on molecular-dynamics simulations of ∼1000 proteins. The predicted entropy can be incorporated into a large class of protein software based on cost-function minimization/evaluation, in which configurational entropy is currently neglected for performance reasons. Software of this type is used for all major protein tasks such as structure predictions, proteins design, NMR and X-ray refinement, docking, and mutation effect predictions. Integrating the predicted entropy can yield a significant accuracy increase as we show exemplarily for native-state identification with the prominent protein software FoldX. The method has been termed Popcoen for Prediction of Protein Configurational Entropy. An implementation is freely available at http://fmc.ub.edu/popcoen/ .
Kou, Qiang; Wu, Si; Tolic, Nikola; Paša-Tolic, Ljiljana; Liu, Yunlong; Liu, Xiaowen
2017-05-01
Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. http://proteomics.informatics.iupui.edu/software/topmg/. xwliu@iupui.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Identification of MS-Cleavable and Non-Cleavable Chemically Crosslinked Peptides with MetaMorpheus.
Lu, Lei; Millikin, Robert J; Solntsev, Stefan K; Rolfs, Zach; Scalf, Mark; Shortreed, Michael R; Smith, Lloyd M
2018-05-25
Protein chemical crosslinking combined with mass spectrometry has become an important technique for the analysis of protein structure and protein-protein interactions. A variety of crosslinkers are well developed, but reliable, rapid, and user-friendly tools for large-scale analysis of crosslinked proteins are still in need. Here we report MetaMorpheusXL, a new search module within the MetaMorpheus software suite that identifies both MS-cleavable and non-cleavable crosslinked peptides in MS data. MetaMorpheusXL identifies MS-cleavable crosslinked peptides with an ion-indexing algorithm, which enables an efficient large database search. The identification does not require the presence of signature fragment ions, an advantage compared to similar programs such as XlinkX. One complication associated with the need for signature ions from cleavable crosslinkers such as DSSO (disuccinimidyl sulfoxide) is the requirement for multiple fragmentation types and energy combinations, which is not necessary for MetaMorpheusXL. The ability to perform proteome-wide analysis is another advantage of MetaMorpheusXl compared to such programs as MeroX and DXMSMS. MetaMorpheusXL is also faster than other currently available MS-cleavable crosslink search software programs. It is imbedded in MetaMorpheus, an open-source and freely available software suite that provides a reliable, fast, user-friendly graphical user interface that is readily accessible to researchers.
NMR-based automated protein structure determination.
Würz, Julia M; Kazemi, Sina; Schmidt, Elena; Bagaria, Anurag; Güntert, Peter
2017-08-15
NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures. Copyright © 2017 Elsevier Inc. All rights reserved.
Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin
2006-09-01
A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.
Wilson, Karl A; Tan-Wilson, Anna
2013-01-01
Mass spectrometry (MS) has become an important tool in studying biological systems. One application is the identification of proteins and peptides by the matching of peptide and peptide fragment masses to the sequences of proteins in protein sequence databases. Often prior protein separation of complex protein mixtures by 2D-PAGE is needed, requiring more time and expertise than instructors of large laboratory classes can devote. We have developed an experimental module for our Biochemistry Laboratory course that engages students in MS-based protein identification following protein separation by one-dimensional SDS-PAGE, a technique that is usually taught in this type of course. The module is based on soybean seed storage proteins, a relatively simple mixture of proteins present in high levels in the seed, allowing the identification of the main protein bands by MS/MS and in some cases, even by peptide mass fingerprinting. Students can identify their protein bands using software available on the Internet, and are challenged to deduce post-translational modifications that have occurred upon germination. A collection of mass spectral data and tutorials that can be used as a stand-alone computer-based laboratory module were also assembled. Copyright © 2013 International Union of Biochemistry and Molecular Biology, Inc.
Protein Identification Using Top-Down Spectra*
Liu, Xiaowen; Sirotkin, Yakov; Shen, Yufeng; Anderson, Gordon; Tsai, Yihsuan S.; Ting, Ying S.; Goodlett, David R.; Smith, Richard D.; Bafna, Vineet; Pevzner, Pavel A.
2012-01-01
In the last two years, because of advances in protein separation and mass spectrometry, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples and identifying hundreds and even thousands of proteins. However, computational tools for database search of top-down spectra against protein databases are still in their infancy. We describe MS-Align+, a fast algorithm for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. We also propose a method for evaluating statistical significance of top-down protein identifications and further benchmark various software tools on two top-down data sets from Saccharomyces cerevisiae and Salmonella typhimurium. We demonstrate that MS-Align+ significantly increases the number of identified spectra as compared with MASCOT and OMSSA on both data sets. Although MS-Align+ and ProSightPC have similar performance on the Salmonella typhimurium data set, MS-Align+ outperforms ProSightPC on the (more complex) Saccharomyces cerevisiae data set. PMID:22027200
Sheng, Quanhu; Li, Rongxia; Dai, Jie; Li, Qingrun; Su, Zhiduan; Guo, Yan; Li, Chen; Shyr, Yu; Zeng, Rong
2015-01-01
Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994. PMID:25435543
ICC-CLASS: isotopically-coded cleavable crosslinking analysis software suite
2010-01-01
Background Successful application of crosslinking combined with mass spectrometry for studying proteins and protein complexes requires specifically-designed crosslinking reagents, experimental techniques, and data analysis software. Using isotopically-coded ("heavy and light") versions of the crosslinker and cleavable crosslinking reagents is analytically advantageous for mass spectrometric applications and provides a "handle" that can be used to distinguish crosslinked peptides of different types, and to increase the confidence of the identification of the crosslinks. Results Here, we describe a program suite designed for the analysis of mass spectrometric data obtained with isotopically-coded cleavable crosslinkers. The suite contains three programs called: DX, DXDX, and DXMSMS. DX searches the mass spectra for the presence of ion signal doublets resulting from the light and heavy isotopic forms of the isotopically-coded crosslinking reagent used. DXDX searches for possible mass matches between cleaved and uncleaved isotopically-coded crosslinks based on the established chemistry of the cleavage reaction for a given crosslinking reagent. DXMSMS assigns the crosslinks to the known protein sequences, based on the isotopically-coded and un-coded MS/MS fragmentation data of uncleaved and cleaved peptide crosslinks. Conclusion The combination of these three programs, which are tailored to the analytical features of the specific isotopically-coded cleavable crosslinking reagents used, represents a powerful software tool for automated high-accuracy peptide crosslink identification. See: http://www.creativemolecules.com/CM_Software.htm PMID:20109223
Moulder, Robert; Filén, Jan-Jonas; Salmi, Jussi; Katajamaa, Mikko; Nevalainen, Olli S; Oresic, Matej; Aittokallio, Tero; Lahesmaa, Riitta; Nyman, Tuula A
2005-07-01
The options available for processing quantitative data from isotope coded affinity tag (ICAT) experiments have mostly been confined to software specific to the instrument of acquisition. However, recent developments with data format conversion have subsequently increased such processing opportunities. In the present study, data sets from ICAT experiments, analysed with liquid chromatography/tandem mass spectrometry (MS/MS), using an Applied Biosystems QSTAR Pulsar quadrupole-TOF mass spectrometer, were processed in triplicate using separate mass spectrometry software packages. The programs Pro ICAT, Spectrum Mill and SEQUEST with XPRESS were employed. Attention was paid towards the extent of common identification and agreement of quantitative results, with additional interest in the flexibility and productivity of these programs. The comparisons were made with data from the analysis of a specifically prepared test mixture, nine proteins at a range of relative concentration ratios from 0.1 to 10 (light to heavy labelled forms), as a known control, and data selected from an ICAT study involving the measurement of cytokine induced protein expression in human lymphoblasts, as an applied example. Dissimilarities were detected in peptide identification that reflected how the associated scoring parameters favoured information from the MS/MS data sets. Accordingly, there were differences in the numbers of peptides and protein identifications, although from these it was apparent that both confirmatory and complementary information was present. In the quantitative results from the three programs, no statistically significant differences were observed.
A rapid identification system for metallothionein proteins using expert system
Praveen, Bhoopathi; Vincent, Savariar; Murty, Upadhyayula Suryanarayana; Krishna, Amirapu Radha; Jamil, Kaiser
2005-01-01
Metallothioneins (MT) are low molecular weight proteins mostly rich in cysteine residues with high metal content. Generally, MT proteins are responsible for regulating the intracellular supply of biologically essential metal ions and they protect cells from the deleterious effects of non-essential polarizable transition and post-transition metal ions. Due to their biological importance, proper characterization of MT is necessary. Here we describe a computer program (ID3 algorithm, a part of Artificial Intelligence) developed using available data for the rapid identification of MT. Tissue samples contains several low molecular weight proteins with different physical, chemical and biological characteristics. The described software solution proposes to categorize MT proteins without aromatic amino acids and high metal content. The proposed solution can be expanded to other types of proteins with specific known characteristics. PMID:17597844
von Haller, Priska D; Yi, Eugene; Donohoe, Samuel; Vaughn, Kelly; Keller, Andrew; Nesvizhskii, Alexey I; Eng, Jimmy; Li, Xiao-jun; Goodlett, David R; Aebersold, Ruedi; Watts, Julian D
2003-07-01
Lipid rafts were prepared according to standard protocols from Jurkat T cells stimulated via T cell receptor/CD28 cross-linking and from control (unstimulated) cells. Co-isolating proteins from the control and stimulated cell preparations were labeled with isotopically normal (d0) and heavy (d8) versions of the same isotope-coded affinity tag (ICAT) reagent, respectively. Samples were combined, proteolyzed, and resultant peptides fractionated via cation exchange chromatography. Cysteine-containing (ICAT-labeled) peptides were recovered via the biotin tag component of the ICAT reagents by avidin-affinity chromatography. On-line micro-capillary liquid chromatography tandem mass spectrometry was performed on both avidin-affinity (ICAT-labeled) and flow-through (unlabeled) fractions. Initial peptide sequence identification was by searching recorded tandem mass spectrometry spectra against a human sequence data base using SEQUEST software. New statistical data modeling algorithms were then applied to the SEQUEST search results. These allowed for discrimination between likely "correct" and "incorrect" peptide assignments, and from these the inferred proteins that they collectively represented, by calculating estimated probabilities that each peptide assignment and subsequent protein identification was a member of the "correct" population. For convenience, the resultant lists of peptide sequences assigned and the proteins to which they corresponded were filtered at an arbitrarily set cut-off of 0.5 (i.e. 50% likely to be "correct") and above and compiled into two separate datasets. In total, these data sets contained 7667 individual peptide identifications, which represented 2669 unique peptide sequences, corresponding to 685 proteins and related protein groups.
Martín-Campos, Trinidad; Mylonas, Roman; Masselot, Alexandre; Waridel, Patrice; Petricevic, Tanja; Xenarios, Ioannis; Quadroni, Manfredo
2017-08-04
Mass spectrometry (MS) has become the tool of choice for the large scale identification and quantitation of proteins and their post-translational modifications (PTMs). This development has been enabled by powerful software packages for the automated analysis of MS data. While data on PTMs of thousands of proteins can nowadays be readily obtained, fully deciphering the complexity and combinatorics of modification patterns even on a single protein often remains challenging. Moreover, functional investigation of PTMs on a protein of interest requires validation of the localization and the accurate quantitation of its changes across several conditions, tasks that often still require human evaluation. Software tools for large scale analyses are highly efficient but are rarely conceived for interactive, in-depth exploration of data on individual proteins. We here describe MsViz, a web-based and interactive software tool that supports manual validation of PTMs and their relative quantitation in small- and medium-size experiments. The tool displays sequence coverage information, peptide-spectrum matches, tandem MS spectra and extracted ion chromatograms through a single, highly intuitive interface. We found that MsViz greatly facilitates manual data inspection to validate PTM location and quantitate modified species across multiple samples.
A new scoring function for top-down spectral deconvolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Wu, Si; Liu, Xiaowen
2014-12-18
Background: Top-down mass spectrometry plays an important role in intact protein identification and characterization. Top-down mass spectra are more complex than bottom-up mass spectra because they often contain many isotopomer envelopes from highly charged ions, which may overlap with one another. As a result, spectral deconvolution, which converts a complex top-down mass spectrum into a monoisotopic mass list, is a key step in top-down spectral interpretation. Results: In this paper, we propose a new scoring function, L-score, for evaluating isotopomer envelopes. By combining L-score with MS-Deconv, a new software tool, MS-Deconv+, was developed for top-down spectral deconvolution. Experimental results showedmore » that MS-Deconv+ outperformed existing software tools in top-down spectral deconvolution. Conclusions: L-score shows high discriminative ability in identification of isotopomer envelopes. Using L-score, MS-Deconv+ reports many correct monoisotopic masses missed by other software tools, which are valuable for proteoform identification and characterization.« less
Vellaichamy, Adaikkalam; Tran, John C.; Catherman, Adam D.; Lee, Ji Eun; Kellie, John F.; Sweet, Steve M.M.; Zamdborg, Leonid; Thomas, Paul M.; Ahlf, Dorothy R.; Durbin, Kenneth R.; Valaskovic, Gary A.; Kelleher, Neil L.
2010-01-01
Despite the availability of ultra-high resolution mass spectrometers, methods for separation and detection of intact proteins for proteome-scale analyses are still in a developmental phase. Here we report robust protocols for on-line LC-MS to drive high-throughput top-down proteomics in a fashion similar to bottom-up. Comparative work on protein standards showed that a polymeric stationary phase led to superior sensitivity over a silica-based medium in reversed-phase nanocapillary-LC, with detection of proteins >50 kDa routinely accomplished in the linear ion trap of a hybrid Fourier-Transform mass spectrometer. Protein identification was enabled by nozzle-skimmer dissociation (NSD) and detection of fragment ions with <5 ppm mass accuracy for highly-specific database searching using custom software. This overall approach led to identification of proteins up to 80 kDa, with 10-60 proteins identified in single LC-MS runs of samples from yeast and human cell lines pre-fractionated by their molecular weight using a gel-based sieving system. PMID:20073486
Bogdán, István A.; Rivers, Jenny; Beynon, Robert J.; Coca, Daniel
2008-01-01
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a ‘fingerprint’ that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution. Contact: d.coca@sheffield.ac.uk PMID:18453553
IMPACT_S: integrated multiprogram platform to analyze and combine tests of selection.
Maldonado, Emanuel; Sunagar, Kartik; Almeida, Daniela; Vasconcelos, Vitor; Antunes, Agostinho
2014-01-01
Among the major goals of research in evolutionary biology are the identification of genes targeted by natural selection and understanding how various regimes of evolution affect the fitness of an organism. In particular, adaptive evolution enables organisms to adapt to changing ecological factors such as diet, temperature, habitat, predatory pressures and prey abundance. An integrative approach is crucial for the identification of non-synonymous mutations that introduce radical changes in protein biochemistry and thus in turn influence the structure and function of proteins. Performing such analyses manually is often a time-consuming process, due to the large number of statistical files generated from multiple approaches, especially when assessing numerous taxa and/or large datasets. We present IMPACT_S, an easy-to-use Graphical User Interface (GUI) software, which rapidly and effectively integrates, filters and combines results from three widely used programs for assessing the influence of selection: Codeml (PAML package), Datamonkey and TreeSAAP. It enables the identification and tabulation of sites detected by these programs as evolving under the influence of positive, neutral and/or negative selection in protein-coding genes. IMPACT_S further facilitates the automatic mapping of these sites onto the three-dimensional structures of proteins. Other useful tools incorporated in IMPACT_S include Jmol, Archaeopteryx, Gnuplot, PhyML, a built-in Swiss-Model interface and a PDB downloader. The relevance and functionality of IMPACT_S is shown through a case study on the toxicoferan-reptilian Cysteine-rich Secretory Proteins (CRiSPs). IMPACT_S is a platform-independent software released under GPLv3 license, freely available online from http://impact-s.sourceforge.net.
Khachane, Amit; Kumar, Ranjit; Jain, Sanyam; Jain, Samta; Banumathy, Gowrishankar; Singh, Varsha; Nagpal, Saurabh; Tatu, Utpal
2005-01-01
Bioinformatics tools to aid gene and protein sequence analysis have become an integral part of biology in the post-genomic era. Release of the Plasmodium falciparum genome sequence has allowed biologists to define the gene and the predicted protein content as well as their sequences in the parasite. Using pI and molecular weight as characteristics unique to each protein, we have developed a bioinformatics tool to aid identification of proteins from Plasmodium falciparum. The tool makes use of a Virtual 2-DE generated by plotting all of the proteins from the Plasmodium database on a pI versus molecular weight scale. Proteins are identified by comparing the position of migration of desired protein spots from an experimental 2-DE and that on a virtual 2-DE. The procedure has been automated in the form of user-friendly software called "Plasmo2D". The tool can be downloaded from http://144.16.89.25/Plasmo2D.zip.
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Veit, Johannes; Sachsenberg, Timo; Chernev, Aleksandar; Aicheler, Fabian; Urlaub, Henning; Kohlbacher, Oliver
2016-09-02
Modern mass spectrometry setups used in today's proteomics studies generate vast amounts of raw data, calling for highly efficient data processing and analysis tools. Software for analyzing these data is either monolithic (easy to use, but sometimes too rigid) or workflow-driven (easy to customize, but sometimes complex). Thermo Proteome Discoverer (PD) is a powerful software for workflow-driven data analysis in proteomics which, in our eyes, achieves a good trade-off between flexibility and usability. Here, we present two open-source plugins for PD providing additional functionality: LFQProfiler for label-free quantification of peptides and proteins, and RNP(xl) for UV-induced peptide-RNA cross-linking data analysis. LFQProfiler interacts with existing PD nodes for peptide identification and validation and takes care of the entire quantitative part of the workflow. We show that it performs at least on par with other state-of-the-art software solutions for label-free quantification in a recently published benchmark ( Ramus, C.; J. Proteomics 2016 , 132 , 51 - 62 ). The second workflow, RNP(xl), represents the first software solution to date for identification of peptide-RNA cross-links including automatic localization of the cross-links at amino acid resolution and localization scoring. It comes with a customized integrated cross-link fragment spectrum viewer for convenient manual inspection and validation of the results.
Courcelles, Mathieu; Coulombe-Huntington, Jasmin; Cossette, Émilie; Gingras, Anne-Claude; Thibault, Pierre; Tyers, Mike
2017-07-07
Protein cross-linking mass spectrometry (CL-MS) enables the sensitive detection of protein interactions and the inference of protein complex topology. The detection of chemical cross-links between protein residues can identify intra- and interprotein contact sites or provide physical constraints for molecular modeling of protein structure. Recent innovations in cross-linker design, sample preparation, mass spectrometry, and software tools have significantly improved CL-MS approaches. Although a number of algorithms now exist for the identification of cross-linked peptides from mass spectral data, a dearth of user-friendly analysis tools represent a practical bottleneck to the broad adoption of the approach. To facilitate the analysis of CL-MS data, we developed CLMSVault, a software suite designed to leverage existing CL-MS algorithms and provide intuitive and flexible tools for cross-platform data interpretation. CLMSVault stores and combines complementary information obtained from different cross-linkers and search algorithms. CLMSVault provides filtering, comparison, and visualization tools to support CL-MS analyses and includes a workflow for label-free quantification of cross-linked peptides. An embedded 3D viewer enables the visualization of quantitative data and the mapping of cross-linked sites onto PDB structural models. We demonstrate the application of CLMSVault for the analysis of a noncovalent Cdc34-ubiquitin protein complex cross-linked under different conditions. CLMSVault is open-source software (available at https://gitlab.com/courcelm/clmsvault.git ), and a live demo is available at http://democlmsvault.tyerslab.com/ .
MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.
Muth, Thilo; Kohrs, Fabian; Heyer, Robert; Benndorf, Dirk; Rapp, Erdmann; Reichl, Udo; Martens, Lennart; Renard, Bernhard Y
2018-01-02
Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .
COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA
Wenger, Craig D.; Phanstiel, Douglas H.; Lee, M. Violet; Bailey, Derek J.; Coon, Joshua J.
2011-01-01
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline. PMID:21298793
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Wu, Si; Tolić, Nikola
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a “bird’s eye view” of intact proteoforms. The combinatorial explosion of various alterations on a protein may result inmore » billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry data sets showed that TopMG outperformed existing methods in identifying complex proteoforms.« less
Proteomic identification of erythrocyte membrane protein deficiency in hereditary spherocytosis.
Peker, Selen; Akar, Nejat; Demiralp, Duygu Ozel
2012-03-01
Hereditary spherocytosis (HS) is the most common congenital hemolytic anemia in Caucasians, with an estimated prevalence ranging from 1:2000 to 1:5000. The molecular defect in one of the erythrocytes (RBC) membrane proteins underlying HS like; spectrin-α, spectrin-β, ankyrin, band 3 and protein 4.2 that lead to membrane destabilization and vesiculation, may change the RBCs into denser and more rigid cells (spherocytes), which are removed by the spleen, leading to the development of hemolytic anemia. It is classified as mild, moderate and severe, according to the degree of the hemolytic anemia and the associated symptoms. Two-dimensional gel electrophoresis (2-DE) is potentially valuable method for studying heritable disorders as HS that involve membrane proteins. This separation technique of proteins based upon two biophysically unrelated parameters; molecular weight and charge, is a good option in clinical proteomics in terms of ability to separate complex mixtures, display post-translational modifications and changes after phosphorylation. In this study, we have used contemporary methods with some modifications for the solubilisation, separation and identification of erythrocyte membrane proteins in normal and in HS RBCs. Spectrin alpha and beta chain, ankyrin and band 3 proteins expression differences were found with PDQuest software 8.0.1. and peptide mass fingerprinting (PMF) analysis performed for identification of proteins in this study.
2013-01-01
Chemical cross-linking of proteins combined with mass spectrometry provides an attractive and novel method for the analysis of native protein structures and protein complexes. Analysis of the data however is complex. Only a small number of cross-linked peptides are produced during sample preparation and must be identified against a background of more abundant native peptides. To facilitate the search and identification of cross-linked peptides, we have developed a novel software suite, named Hekate. Hekate is a suite of tools that address the challenges involved in analyzing protein cross-linking experiments when combined with mass spectrometry. The software is an integrated pipeline for the automation of the data analysis workflow and provides a novel scoring system based on principles of linear peptide analysis. In addition, it provides a tool for the visualization of identified cross-links using three-dimensional models, which is particularly useful when combining chemical cross-linking with other structural techniques. Hekate was validated by the comparative analysis of cytochrome c (bovine heart) against previously reported data.1 Further validation was carried out on known structural elements of DNA polymerase III, the catalytic α-subunit of the Escherichia coli DNA replisome along with new insight into the previously uncharacterized C-terminal domain of the protein. PMID:24010795
Hufnagel, P.; Glandorf, J.; Körting, G.; Jabs, W.; Schweiger-Hufnagel, U.; Hahner, S.; Lubeck, M.; Suckau, D.
2007-01-01
Analysis of complex proteomes often results in long protein lists, but falls short in measuring the validity of identification and quantification results on a greater number of proteins. Biological and technical replicates are mandatory, as is the combination of the MS data from various workflows (gels, 1D-LC, 2D-LC), instruments (TOF/TOF, trap, qTOF or FTMS), and search engines. We describe a database-driven study that combines two workflows, two mass spectrometers, and four search engines with protein identification following a decoy database strategy. The sample was a tryptically digested lysate (10,000 cells) of a human colorectal cancer cell line. Data from two LC-MALDI-TOF/TOF runs and a 2D-LC-ESI-trap run using capillary and nano-LC columns were submitted to the proteomics software platform ProteinScape. The combined MALDI data and the ESI data were searched using Mascot (Matrix Science), Phenyx (GeneBio), ProteinSolver (Bruker and Protagen), and Sequest (Thermo) against a decoy database generated from IPI-human in order to obtain one protein list across all workflows and search engines at a defined maximum false-positive rate of 5%. ProteinScape combined the data to one LC-MALDI and one LC-ESI dataset. The initial separate searches from the two combined datasets generated eight independent peptide lists. These were compiled into an integrated protein list using the ProteinExtractor algorithm. An initial evaluation of the generated data led to the identification of approximately 1200 proteins. Result integration on a peptide level allowed discrimination of protein isoforms that would not have been possible with a mere combination of protein lists.
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
Tang, Haixu; Li, Sujun; Ye, Yuzhen
2016-01-01
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579
MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B; Sharma, Vineet K
2014-01-01
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.
MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B.; Sharma, Vineet K.
2014-01-01
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php. PMID:24736651
Thiele, H.; Glandorf, J.; Koerting, G.; Reidegeld, K.; Blüggel, M.; Meyer, H.; Stephan, C.
2007-01-01
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE. ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list. The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format. The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
Dyrlund, Thomas F; Poulsen, Ebbe T; Scavenius, Carsten; Sanggaard, Kristian W; Enghild, Jan J
2012-09-01
Data processing and analysis of proteomics data are challenging and time consuming. In this paper, we present MS Data Miner (MDM) (http://sourceforge.net/p/msdataminer), a freely available web-based software solution aimed at minimizing the time required for the analysis, validation, data comparison, and presentation of data files generated in MS software, including Mascot (Matrix Science), Mascot Distiller (Matrix Science), and ProteinPilot (AB Sciex). The program was developed to significantly decrease the time required to process large proteomic data sets for publication. This open sourced system includes a spectra validation system and an automatic screenshot generation tool for Mascot-assigned spectra. In addition, a Gene Ontology term analysis function and a tool for generating comparative Excel data reports are included. We illustrate the benefits of MDM during a proteomics study comprised of more than 200 LC-MS/MS analyses recorded on an AB Sciex TripleTOF 5600, identifying more than 3000 unique proteins and 3.5 million peptides. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ojima-Kato, Teruyo; Yamamoto, Naomi; Nagai, Satomi; Shima, Keisuke; Akiyama, Yumi; Ota, Junji; Tamura, Hiroto
2017-12-01
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)-based microbial identification is a popular analytical method. Strain Solution proteotyping software available for MALDI-TOF MS has great potential for the precise and detailed discrimination of microorganisms at serotype- or strain-level, beyond the conventional mass fingerprinting approaches. Here, we constructed a theoretically calculated mass database of Salmonella enterica subspecies enterica consisting of 12 biomarker proteins: ribosomal proteins S8, L15, L17, L21, L25, and S7, Mn-cofactor-containing superoxide dismutase (SodA), peptidyl-prolyl cis-trans isomerase C (PPIase C), and protein Gns, and uncharacterized proteins YibT, YaiA, and YciF, that can allow serotyping of Salmonella. Strain Solution ver. 2 software with the novel database constructed in this study demonstrated that 109 strains (94%), including the major outbreak-associated serotypes, Enteritidis, Typhimurium, and Infantis, could be correctly identified from others by colony-directed MALDI-TOF MS using 116 strains belonging to 23 kinds of typed and untyped serotypes of S. enterica from culture collections, patients, and foods. We conclude that Strain Solution ver. 2 software integrated with the accurate mass database will be useful for the bacterial proteotyping by MALDI-TOF MS-based microbial classification in the clinical and food safety fields.
Challenges and perspectives of metaproteomic data analysis.
Heyer, Robert; Schallert, Kay; Zoun, Roman; Becher, Beatrice; Saake, Gunter; Benndorf, Dirk
2017-11-10
In nature microorganisms live in complex microbial communities. Comprehensive taxonomic and functional knowledge about microbial communities supports medical and technical application such as fecal diagnostics as well as operation of biogas plants or waste water treatment plants. Furthermore, microbial communities are crucial for the global carbon and nitrogen cycle in soil and in the ocean. Among the methods available for investigation of microbial communities, metaproteomics can approximate the activity of microorganisms by investigating the protein content of a sample. Although metaproteomics is a very powerful method, issues within the bioinformatic evaluation impede its success. In particular, construction of databases for protein identification, grouping of redundant proteins as well as taxonomic and functional annotation pose big challenges. Furthermore, growing amounts of data within a metaproteomics study require dedicated algorithms and software. This review summarizes recent metaproteomics software and addresses the introduced issues in detail. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Synchronous versus asynchronous modeling of gene regulatory networks.
Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni
2008-09-01
In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.
Distant plant homologues: don't throw out the baby.
Gardiner, John; Overall, Robyn; Marc, Jan
2012-03-01
Plants and metazoans share many similarities in terms of conserved proteins. Antibodies have been used extensively to detect remote homologues, many of which are yet to be identified conclusively. Genome sequencing and the creation of novel sequence or structure comparison programs have assisted greatly in the identification of distant protein homologues. The continuing development of new software algorithms and the combining of bioinformatics with proteomics offer hope that remaining homologues will be soon identified. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ghosh, Soma; Prava, Jyoti; Samal, Himanshu Bhusan; Suar, Mrutyunjay; Mahapatra, Rajani Kanta
2014-06-01
Now-a-days increasing emergence of antibiotic-resistant pathogenic microorganisms is one of the biggest challenges for management of disease. In the present study comparative genomics, metabolic pathways analysis and additional parameters were defined for the identification of 94 non-homologous essential proteins in Staphylococcus aureus genome. Further study prioritized 19 proteins as vaccine candidates where as druggability study reports 34 proteins suitable as drug targets. Enzymes from peptidoglycan biosynthesis, folate biosynthesis were identified as candidates for drug development. Furthermore, bacterial secretory proteins and few hypothetical proteins identified in our analysis fulfill the criteria of vaccine candidates. As a case study, we built a homology model of one of the potential drug target, MurA ligase, using MODELLER (9v12) software. The model has been further selected for in silico docking study with inhibitors from the DrugBank database. Results from this study could facilitate selection of proteins for entry into drug design and vaccine production pipelines. Copyright © 2014 Elsevier B.V. All rights reserved.
Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Zhu, Binhai; Wu, Si
Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs comparedmore » with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.« less
Hufnagel, Peter; Rabus, Ralf
2006-01-01
The rapidly developing proteomics technologies help to advance the global understanding of physiological and cellular processes. The lifestyle of a study organism determines the type and complexity of a given proteomic project. The complexity of this study is characterized by a broad collection of pathway-specific subproteomes, reflecting the metabolic versatility as well as the regulatory potential of the aromatic-degrading, denitrifying bacterium 'Aromatoleum' sp. strain EbN1. Differences in protein profiles were determined using a gel-based approach. Protein identification was based on a progressive application of MALDI-TOF-MS, MALDI-TOF-MS/MS and LC-ESI-MS/MS. This progression was result-driven and automated by software control. The identification rate was increased by the assembly of a project-specific list of background signals that was used for internal calibration of the MS spectra, and by the combination of two search engines using a dedicated MetaScoring algorithm. In total, intelligent bioinformatics could increase the identification yield from 53 to 70% of the analyzed 5,050 gel spots; a total of 556 different proteins were identified. MS identification was highly reproducible: most proteins were identified more than twice from parallel 2DE gels with an average sequence coverage of >50% and rather restrictive score thresholds (Mascot >or=95, ProFound >or=2.2, MetaScore >or=97). The MS technologies and bioinformatics tools that were implemented and integrated to handle this complex proteomic project are presented. In addition, we describe the basic principles and current developments of the applied technologies and provide an overview over the current state of microbial proteome research. Copyright (c) 2006 S. Karger AG, Basel.
Tonal Interface to MacroMolecules (TIMMol): A Textual and Tonal Tool for Molecular Visualization
ERIC Educational Resources Information Center
Cordes, Timothy J.; Carlson, C. Britt; Forest, Katrina T.
2008-01-01
We developed the three-dimensional visualization software, Tonal Interface to MacroMolecules or TIMMol, for studying atomic coordinates of protein structures. Key features include audio tones indicating x, y, z location, identification of the cursor location in one-dimensional and three-dimensional space, textual output that can be easily linked…
Unassigned MS/MS Spectra: Who Am I?
Pathan, Mohashin; Samuel, Monisha; Keerthikumar, Shivakumar; Mathivanan, Suresh
2017-01-01
Recent advances in high resolution tandem mass spectrometry (MS) has resulted in the accumulation of high quality data. Paralleled with these advances in instrumentation, bioinformatics software have been developed to analyze such quality datasets. In spite of these advances, data analysis in mass spectrometry still remains critical for protein identification. In addition, the complexity of the generated MS/MS spectra, unpredictable nature of peptide fragmentation, sequence annotation errors, and posttranslational modifications has impeded the protein identification process. In a typical MS data analysis, about 60 % of the MS/MS spectra remains unassigned. While some of these could attribute to the low quality of the MS/MS spectra, a proportion can be classified as high quality. Further analysis may reveal how much of the unassigned MS spectra attribute to search space, sequence annotation errors, mutations, and/or posttranslational modifications. In this chapter, the tools used to identify proteins and ways to assign unassigned tandem MS spectra are discussed.
Falkner, Jayson; Andrews, Philip
2005-05-15
Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
iview: an interactive WebGL visualizer for protein-ligand complex.
Li, Hongjian; Leung, Kwong-Sak; Nakane, Takanori; Wong, Man-Hon
2014-02-25
Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable. We have developed iview, an easy-to-use interactive WebGL visualizer of protein-ligand complex. It exploits hardware acceleration rather than software rendering. It features three special effects in virtual reality settings, namely anaglyph, parallax barrier and oculus rift, resulting in visually appealing identification of intermolecular interactions. It supports four surface representations including Van der Waals surface, solvent excluded surface, solvent accessible surface and molecular surface. Moreover, based on the feature-rich version of iview, we have also developed a neat and tailor-made version specifically for our istar web platform for protein-ligand docking purpose. This demonstrates the excellent portability of iview. Using innovative 3D techniques, we provide a user friendly visualizer that is not intended to compete with professional visualizers, but to enable easy accessibility and platform independence.
Allmer, Jens; Kuhlgert, Sebastian; Hippler, Michael
2008-07-07
The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed. In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application. We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.
Savitski, Mikhail M; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus
2015-09-01
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets
Savitski, Mikhail M.; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus
2015-01-01
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. PMID:25987413
Code of Federal Regulations, 2014 CFR
2014-10-01
... and marking of computer software or computer software documentation to be furnished with restrictive... Rights in Computer Software and Computer Software Documentation 227.7203-10 Contractor identification and marking of computer software or computer software documentation to be furnished with restrictive markings...
Code of Federal Regulations, 2011 CFR
2011-10-01
... and marking of computer software or computer software documentation to be furnished with restrictive... Rights in Computer Software and Computer Software Documentation 227.7203-10 Contractor identification and marking of computer software or computer software documentation to be furnished with restrictive markings...
Code of Federal Regulations, 2012 CFR
2012-10-01
... and marking of computer software or computer software documentation to be furnished with restrictive... Rights in Computer Software and Computer Software Documentation 227.7203-10 Contractor identification and marking of computer software or computer software documentation to be furnished with restrictive markings...
Code of Federal Regulations, 2010 CFR
2010-10-01
... and marking of computer software or computer software documentation to be furnished with restrictive... Rights in Computer Software and Computer Software Documentation 227.7203-10 Contractor identification and marking of computer software or computer software documentation to be furnished with restrictive markings...
Code of Federal Regulations, 2013 CFR
2013-10-01
... and marking of computer software or computer software documentation to be furnished with restrictive... Rights in Computer Software and Computer Software Documentation 227.7203-10 Contractor identification and marking of computer software or computer software documentation to be furnished with restrictive markings...
Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu
2013-01-04
Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .
Software Risk Identification for Interplanetary Probes
NASA Technical Reports Server (NTRS)
Dougherty, Robert J.; Papadopoulos, Periklis E.
2005-01-01
The need for a systematic and effective software risk identification methodology is critical for interplanetary probes that are using increasingly complex and critical software. Several probe failures are examined that suggest more attention and resources need to be dedicated to identifying software risks. The direct causes of these failures can often be traced to systemic problems in all phases of the software engineering process. These failures have lead to the development of a practical methodology to identify risks for interplanetary probes. The proposed methodology is based upon the tailoring of the Software Engineering Institute's (SEI) method of taxonomy-based risk identification. The use of this methodology will ensure a more consistent and complete identification of software risks in these probes.
Kellie, John F.; Tran, John C.; Lee, Ji Eun; Ahlf, Dorothy R.; Thomas, Haylee M.; Ntai, Ioanna; Catherman, Adam D.; Durbin, Kenneth R.; Zamdborg, Leonid; Vellaichamy, Adaikkalam; Thomas, Paul M.
2011-01-01
Top Down mass spectrometry (MS) has emerged as an alternative to common Bottom Up strategies for protein analysis. In the Top Down approach, intact proteins are fragmented directly in the mass spectrometer to achieve both protein identification and characterization, even capturing information on combinatorial post-translational modifications. Just in the past two years, Top Down MS has seen incremental advances in instrumentation and dedicated software, and has also experienced a major boost from refined separations of whole proteins in complex mixtures that have both high recovery and reproducibility. Combined with steadily advancing commercial MS instrumentation and data processing, a high-throughput workflow covering intact proteins and polypeptides up to 70 kDa is directly visible in the near future. PMID:20711533
Code of Federal Regulations, 2010 CFR
2010-10-01
... computer software or computer software documentation to be furnished to the Government with restrictions on..., DATA, AND COPYRIGHTS Rights in Computer Software and Computer Software Documentation 227.7203-3 Early identification of computer software or computer software documentation to be furnished to the Government with...
Code of Federal Regulations, 2012 CFR
2012-10-01
... computer software or computer software documentation to be furnished to the Government with restrictions on..., DATA, AND COPYRIGHTS Rights in Computer Software and Computer Software Documentation 227.7203-3 Early identification of computer software or computer software documentation to be furnished to the Government with...
Code of Federal Regulations, 2011 CFR
2011-10-01
... computer software or computer software documentation to be furnished to the Government with restrictions on..., DATA, AND COPYRIGHTS Rights in Computer Software and Computer Software Documentation 227.7203-3 Early identification of computer software or computer software documentation to be furnished to the Government with...
Code of Federal Regulations, 2013 CFR
2013-10-01
... computer software or computer software documentation to be furnished to the Government with restrictions on..., DATA, AND COPYRIGHTS Rights in Computer Software and Computer Software Documentation 227.7203-3 Early identification of computer software or computer software documentation to be furnished to the Government with...
Code of Federal Regulations, 2014 CFR
2014-10-01
... computer software or computer software documentation to be furnished to the Government with restrictions on..., DATA, AND COPYRIGHTS Rights in Computer Software and Computer Software Documentation 227.7203-3 Early identification of computer software or computer software documentation to be furnished to the Government with...
Informed-Proteomics: open-source software package for top-down proteomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Jungkap; Piehowski, Paul D.; Wilkins, Christopher
Top-down proteomics involves the analysis of intact proteins. This approach is very attractive as it allows for analyzing proteins in their endogenous form without proteolysis, preserving valuable information about post-translation modifications, isoforms, proteolytic processing or their combinations collectively called proteoforms. Moreover, the quality of the top-down LC-MS/MS datasets is rapidly increasing due to advances in the liquid chromatography and mass spectrometry instrumentation and sample processing protocols. However, the top-down mass spectra are substantially more complex compare to the more conventional bottom-up data. To take full advantage of the increasing quality of the top-down LC-MS/MS datasets there is an urgent needmore » to develop algorithms and software tools for confident proteoform identification and quantification. In this study we present a new open source software suite for top-down proteomics analysis consisting of an LC-MS feature finding algorithm, a database search algorithm, and an interactive results viewer. The presented tool along with several other popular tools were evaluated using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.« less
Modification Site Localization in Peptides.
Chalkley, Robert J
2016-01-01
There are a large number of search engines designed to take mass spectrometry fragmentation spectra and match them to peptides from proteins in a database. These peptides could be unmodified, but they could also bear modifications that were added biologically or during sample preparation. As a measure of reliability for the peptide identification, software normally calculates how likely a given quality of match could have been achieved at random, most commonly through the use of target-decoy database searching (Elias and Gygi, Nat Methods 4(3): 207-214, 2007). Matching the correct peptide but with the wrong modification localization is not a random match, so results with this error will normally still be assessed as reliable identifications by the search engine. Hence, an extra step is required to determine site localization reliability, and the software approaches to measure this are the subject of this part of the chapter.
hEIDI: An Intuitive Application Tool To Organize and Treat Large-Scale Proteomics Data.
Hesse, Anne-Marie; Dupierris, Véronique; Adam, Claire; Court, Magali; Barthe, Damien; Emadali, Anouk; Masselon, Christophe; Ferro, Myriam; Bruley, Christophe
2016-10-07
Advances in high-throughput proteomics have led to a rapid increase in the number, size, and complexity of the associated data sets. Managing and extracting reliable information from such large series of data sets require the use of dedicated software organized in a consistent pipeline to reduce, validate, exploit, and ultimately export data. The compilation of multiple mass-spectrometry-based identification and quantification results obtained in the context of a large-scale project represents a real challenge for developers of bioinformatics solutions. In response to this challenge, we developed a dedicated software suite called hEIDI to manage and combine both identifications and semiquantitative data related to multiple LC-MS/MS analyses. This paper describes how, through a user-friendly interface, hEIDI can be used to compile analyses and retrieve lists of nonredundant protein groups. Moreover, hEIDI allows direct comparison of series of analyses, on the basis of protein groups, while ensuring consistent protein inference and also computing spectral counts. hEIDI ensures that validated results are compliant with MIAPE guidelines as all information related to samples and results is stored in appropriate databases. Thanks to the database structure, validated results generated within hEIDI can be easily exported in the PRIDE XML format for subsequent publication. hEIDI can be downloaded from http://biodev.extra.cea.fr/docs/heidi .
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Proteomic characterization of hempseed (Cannabis sativa L.).
Aiello, Gilda; Fasoli, Elisa; Boschin, Giovanna; Lammi, Carmen; Zanoni, Chiara; Citterio, Attilio; Arnoldi, Anna
2016-09-16
This paper presents an investigation on hempseed proteome. The experimental approach, based on combinatorial peptide ligand libraries (CPLLs), SDS-PAGE separation, nLC-ESI-MS/MS identification, and database search, permitted identifying in total 181 expressed proteins. This very large number of identifications was achieved by searching in two databases: Cannabis sativa L. (56 gene products identified) and Arabidopsis thaliana (125 gene products identified). By performing a protein-protein association network analysis using the STRING software, it was possible to build the first interactomic map of all detected proteins, characterized by 137 nodes and 410 interactions. Finally, a Gene Ontology analysis of the identified species permitted to classify their molecular functions: the great majority is involved in the seed metabolic processes (41%), responses to stimulus (8%), and biological process (7%). Hempseed is an underexploited non-legume protein-rich seed. Although its protein is well known for its digestibility, essential amino acid composition, and useful techno-functional properties, a comprehensive proteome characterization is still lacking. The objective of this work was to fill this knowledge gap and provide information useful for a better exploitation of this seed in different food products. Copyright © 2016 Elsevier B.V. All rights reserved.
Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav
2013-01-01
Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something which detached N-glycan and de-glycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy which takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false-discovery-rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at false-discovery-rate 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at false-discovery-rate 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http://edwardslab.bmcb.georgetown.edu/GPS. PMID:23829323
Estimating the Efficiency of Phosphopeptide Identification by Tandem Mass Spectrometry
NASA Astrophysics Data System (ADS)
Hsu, Chuan-Chih; Xue, Liang; Arrington, Justine V.; Wang, Pengcheng; Paez Paez, Juan Sebastian; Zhou, Yuan; Zhu, Jian-Kang; Tao, W. Andy
2017-06-01
Mass spectrometry has played a significant role in the identification of unknown phosphoproteins and sites of phosphorylation in biological samples. Analyses of protein phosphorylation, particularly large scale phosphoproteomic experiments, have recently been enhanced by efficient enrichment, fast and accurate instrumentation, and better software, but challenges remain because of the low stoichiometry of phosphorylation and poor phosphopeptide ionization efficiency and fragmentation due to neutral loss. Phosphoproteomics has become an important dimension in systems biology studies, and it is essential to have efficient analytical tools to cover a broad range of signaling events. To evaluate current mass spectrometric performance, we present here a novel method to estimate the efficiency of phosphopeptide identification by tandem mass spectrometry. Phosphopeptides were directly isolated from whole plant cell extracts, dephosphorylated, and then incubated with one of three purified kinases—casein kinase II, mitogen-activated protein kinase 6, and SNF-related protein kinase 2.6—along with 16O4- and 18O4-ATP separately for in vitro kinase reactions. Phosphopeptides were enriched and analyzed by LC-MS. The phosphopeptide identification rate was estimated by comparing phosphopeptides identified by tandem mass spectrometry with phosphopeptide pairs generated by stable isotope labeled kinase reactions. Overall, we found that current high speed and high accuracy mass spectrometers can only identify 20%-40% of total phosphopeptides primarily due to relatively poor fragmentation, additional modifications, and low abundance, highlighting the urgent need for continuous efforts to improve phosphopeptide identification efficiency. [Figure not available: see fulltext.
Probing structures of large protein complexes using zero-length cross-linking.
Rivera-Santiago, Roland F; Sriswasdi, Sira; Harper, Sandra L; Speicher, David W
2015-11-01
Structural mass spectrometry (MS) is a field with growing applicability for addressing complex biophysical questions regarding proteins and protein complexes. One of the major structural MS approaches involves the use of chemical cross-linking coupled with MS analysis (CX-MS) to identify proximal sites within macromolecules. Identified cross-linked sites can be used to probe novel protein-protein interactions or the derived distance constraints can be used to verify and refine molecular models. This review focuses on recent advances of "zero-length" cross-linking. Zero-length cross-linking reagents do not add any atoms to the cross-linked species due to the lack of a spacer arm. This provides a major advantage in the form of providing more precise distance constraints as the cross-linkable groups must be within salt bridge distances in order to react. However, identification of cross-linked peptides using these reagents presents unique challenges. We discuss recent efforts by our group to minimize these challenges by using multiple cycles of LC-MS/MS analysis and software specifically developed and optimized for identification of zero-length cross-linked peptides. Representative data utilizing our current protocol are presented and discussed. Copyright © 2015 Elsevier Inc. All rights reserved.
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006
Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio
2014-01-01
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
Leung, Kit-Yi; Lescuyer, Pierre; Campbell, James; Byers, Helen L; Allard, Laure; Sanchez, Jean-Charles; Ward, Malcolm A
2005-08-01
A novel strategy consisting of cleavable Isotope-Coded Affinity Tag (cICAT) combined with MASCOT Distiller was evaluated as a tool for the quantification of proteins in "abnormal" patient plasma, prepared by pooling samples from patients with acute stroke. Quantification of all light and heavy cICAT-labelled peptide ion pairs was obtained using MASCOT Distiller combined with a proprietary software. Peptides displaying differences were selected for identification by MS. These preliminary results show the promise of our approach to identify potential biomarkers.
Garbis, Spiros D; Roumeliotis, Theodoros I; Tyritzis, Stavros I; Zorpas, Kostas M; Pavlakis, Kitty; Constantinides, Constantinos A
2011-02-01
The current proof-of-principle study was aimed toward development of a novel multidimensional protein identification technology (MudPIT) approach for the in-depth proteome analysis of human serum derived from patients with benign prostate hyperplasia (BPH) using rational chromatographic design principles. This study constituted an extension of our published work relating to the identification and relative quantification of potential clinical biomarkers in BPH and prostate cancer (PCa) tissue specimens. The proposed MudPIT approach encompassed the use of three distinct yet complementary liquid chromatographic chemistries. High-pressure size-exclusion chromatography (SEC) was used for the prefractionation of serum proteins followed by their dialysis exchange and solution phase trypsin proteolysis. The tryptic peptides were then subjected to offline zwitterion-ion hydrophilic interaction chromatography (ZIC-HILIC) fractionation followed by their online analysis with reversed-phase nano-ultraperformance chromatography (RP-nUPLC) hyphenated to nanoelectrospray ionization-tandem mass spectrometry using an ion trap mass analyzer. For the spectral processing, the sequential use of the SpectrumMill, Scaffold, and InsPecT software tools was applied for the tryptic peptide product ion MS(2) spectral processing, false discovery rate (FDR) assessment, validation, and protein identification. This milestone serum analysis study allowed the confident identification of over 1955 proteins (p ≤ 0.05; FDR ≤ 5%) with a broad spectrum of biological and physicochemical properties including secreted, tissue-specific proteins spanning approximately 12 orders of magnitude as they occur in their native abundance levels in the serum matrix. Also encompassed in this proteome was the confident identification of 375 phosphoproteins (p ≤ 0.05; FDR ≤ 5%) with potential importance to cancer biology. To demonstrate the performance characteristics of this novel MudPIT approach, a comparison was made with the proteomes resulting from the immunodepletion of the high abundant albumin and IgG proteins with offline first dimensional tryptic peptide separation with both ZIC-HILIC and strong cation exchange (SCX) chromatography and their subsequent online RP-nUPLC-nESI-MS(2) analysis.
Goedhals, Dominique; Bester, Phillip A; Paweska, Janusz T; Swanepoel, Robert; Burt, Felicity J
2015-05-01
Crimean-Congo haemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family with a tripartite, negative sense RNA genome. This study used predictive software to analyse the L (large), M (medium), and S (small) segments of 14 southern African CCHFV isolates. The OTU-like cysteine protease domain and the RdRp domain of the L segment are highly conserved among southern African CCHFV isolates. The M segment encodes the structural glycoproteins, GN and GC, and the non-structural glycoproteins which are post-translationally cleaved at highly conserved furin and subtilase SKI-1 cleavage sites. All of the sites previously identified were shown to be conserved among southern African CCHFV isolates. The heavily O-glycosylated N-terminal variable mucin-like domain of the M segment shows the highest sequence variability of the CCHFV proteins. Five transmembrane domains are predicted in the M segment polyprotein resulting in three regions internal to and three regions external to the membrane across the G(N), NS(M) and G(C) glycoproteins. The corroboration of conserved genome domains and sequence identity among geographically diverse isolates may assist in the identification of protein function and pathogenic mechanisms, as well as the identification of potential targets for antiviral therapy and vaccine design. As detailed functional studies are lacking for many of the CCHFV proteins, identification of functional domains by prediction of protein structure, and identification of amino acid level similarity to functionally characterised proteins of related viruses or viruses with similar pathogenic mechanisms are a necessary step for selection of areas for further study. © 2015 Wiley Periodicals, Inc.
2011-01-01
Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234
Antunes, Deborah; Jorge, Natasha A. N.; Caffarena, Ernesto R.; Passetti, Fabio
2018-01-01
RNA molecules are essential players in many fundamental biological processes. Prokaryotes and eukaryotes have distinct RNA classes with specific structural features and functional roles. Computational prediction of protein structures is a research field in which high confidence three-dimensional protein models can be proposed based on the sequence alignment between target and templates. However, to date, only a few approaches have been developed for the computational prediction of RNA structures. Similar to proteins, RNA structures may be altered due to the interaction with various ligands, including proteins, other RNAs, and metabolites. A riboswitch is a molecular mechanism, found in the three kingdoms of life, in which the RNA structure is modified by the binding of a metabolite. It can regulate multiple gene expression mechanisms, such as transcription, translation initiation, and mRNA splicing and processing. Due to their nature, these entities also act on the regulation of gene expression and detection of small metabolites and have the potential to helping in the discovery of new classes of antimicrobial agents. In this review, we describe software and web servers currently available for riboswitch aptamer identification and secondary and tertiary structure prediction, including applications. PMID:29403526
Couvin, David; Bernheim, Aude; Toffano-Nioche, Claire; Touchon, Marie; Michalik, Juraj; Néron, Bertrand; C Rocha, Eduardo P; Vergnaud, Gilles; Gautheret, Daniel; Pourcel, Christine
2018-05-22
CRISPR (clustered regularly interspaced short palindromic repeats) arrays and their associated (Cas) proteins confer bacteria and archaea adaptive immunity against exogenous mobile genetic elements, such as phages or plasmids. CRISPRCasFinder allows the identification of both CRISPR arrays and Cas proteins. The program includes: (i) an improved CRISPR array detection tool facilitating expert validation based on a rating system, (ii) prediction of CRISPR orientation and (iii) a Cas protein detection and typing tool updated to match the latest classification scheme of these systems. CRISPRCasFinder can either be used online or as a standalone tool compatible with Linux operating system. All third-party software packages employed by the program are freely available. CRISPRCasFinder is available at https://crisprcas.i2bc.paris-saclay.fr.
Vlachopanos, A; Soupsana, E; Politou, A S; Papamokos, G V
2014-12-01
Mass spectrometry is a widely used technique for protein identification and it has also become the method of choice in order to detect and characterize the post-translational modifications (PTMs) of proteins. Many software tools have been developed to deal with this complication. In this paper we introduce a new, free and user friendly online software tool, named POTAMOS Mass Spectrometry Calculator, which was developed in the open source application framework Ruby on Rails. It can provide calculated mass spectrometry data in a time saving manner, independently of instrumentation. In this web application we have focused on a well known protein family of histones whose PTMs are believed to play a crucial role in gene regulation, as suggested by the so called "histone code" hypothesis. The PTMs implemented in this software are: methylations of arginines and lysines, acetylations of lysines and phosphorylations of serines and threonines. The application is able to calculate the kind, the number and the combinations of the possible PTMs corresponding to a given peptide sequence and a given mass along with the full set of the unique primary structures produced by the possible distributions along the amino acid sequence. It can also calculate the masses and charges of a fragmented histone variant, which carries predefined modifications already implemented. Additional functionality is provided by the calculation of the masses of fragments produced upon protein cleavage by the proteolytic enzymes that are most widely used in proteomics studies. Copyright © 2014 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogdanov, Bogdan; Smith, Richard D.
This review offers a broad overview of recent FTICR applications and technological developments in the field of proteomics, directed to a variety of people with different expertise and interests. Both the ''bottom-up'' (peptide level) and ''top-down'' (intact protein level) approaches will be covered and various related aspects will be discussed and illustrated with examples that are among the best available references in the literature. ''Bottom-up topics include peptide fragmentation, the AMT approach and DREAMS technology, quantitative proteomics, post-translational modifications, and special FTICR software focused on peptide and protein identification. Topics in the ''top-down'' part include various aspects of high-mass measurements,more » protein tandem mass spectrometry, protein confirmations, protein-protein complexes, as well as some esoteric applications that may become more practical in the coming years. Finally, examples of integrating both approaches and medical proteomics applications using FTICR will be provided, closing with an outlook of what may be coming our way sooner than later.« less
Automated Analysis of Fluorescence Microscopy Images to Identify Protein-Protein Interactions
Venkatraman, S.; Doktycz, M. J.; Qi, H.; ...
2006-01-01
The identification of protein interactions is important for elucidating biological networks. One obstacle in comprehensive interaction studies is the analyses of large datasets, particularly those containing images. Development of an automated system to analyze an image-based protein interaction dataset is needed. Such an analysis system is described here, to automatically extract features from fluorescence microscopy images obtained from a bacterial protein interaction assay. These features are used to relay quantitative values that aid in the automated scoring of positive interactions. Experimental observations indicate that identifying at least 50% positive cells in an image is sufficient to detect a protein interaction.more » Based on this criterion, the automated system presents 100% accuracy in detecting positive interactions for a dataset of 16 images. Algorithms were implemented using MATLAB and the software developed is available on request from the authors.« less
Software architecture of the III/FBI segment of the FBI's integrated automated identification system
NASA Astrophysics Data System (ADS)
Booker, Brian T.
1997-02-01
This paper will describe the software architecture of the Interstate Identification Index (III/FBI) Segment of the FBI's Integrated Automated Fingerprint Identification System (IAFIS). IAFIS is currently under development, with deployment to begin in 1998. III/FBI will provide the repository of criminal history and photographs for criminal subjects, as well as identification data for military and civilian federal employees. Services provided by III/FBI include maintenance of the criminal and civil data, subject search of the criminal and civil data, and response generation services for IAFIS. III/FBI software will be comprised of both COTS and an estimated 250,000 lines of developed C code. This paper will describe the following: (1) the high-level requirements of the III/FBI software; (2) the decomposition of the III/FBI software into Computer Software Configuration Items (CSCIs); (3) the top-level design of the III/FBI CSCIs; and (4) the relationships among the developed CSCIs and the COTS products that will comprise the III/FBI software.
NASA Astrophysics Data System (ADS)
Thanh Tran, The; Phan, Van Chi
2010-03-01
In this work, we present results of membrane proteome profiling from mouse liver tissues using a gel-based approach in combination with 2DnanoLC-Q-TOF-MS/MS. Following purification of the membrane fraction, SDS-PAGE was carried out as a useful separation step. After staining, gels with protein bands were cut, reduced, alkylated and trypsin-digested. The peptide mixtures extracted from each gel slice were fractionated by two-dimensional nano liquid chromatography (2DnanoLC) coupled online with tandem mass spectrometry analysis (NanoESI-Q-TOF-MS/MS). The proteins were identified by MASCOT search against a mouse protein database using a peptide and fragment mass tolerance of ±0.5 Da. Protein identification was carried out using a Mowse scoring algorithm with a confidence level of 95% and processed by MSQuant v1.5 software for further validation. In total, 318 verified membrane proteins from mouse liver tissues were identified; 66.67% of them (212 proteins) contained at least one or more transmembrane domains predicted by the SOSUI program and 43 were found to be unique microsome membranes. Furthermore, GRAVY values of membrane proteins varied in the range -1.1276 to 0.9016 and only 31 (9.76%) membrane proteins had positive values. The functions and subcellular locations of the identified proteins were categorized as well, according to universal GO annotations.
Zhang, Chaoyang; Peng, Li; Zhang, Yaqin; Liu, Zhaoyang; Li, Wenling; Chen, Shilian; Li, Guancheng
2017-06-01
Liver cancer is a serious threat to public health and has fairly complicated pathogenesis. Therefore, the identification of key genes and pathways is of much importance for clarifying molecular mechanism of hepatocellular carcinoma (HCC) initiation and progression. HCC-associated gene expression dataset was downloaded from Gene Expression Omnibus database. Statistical software R was used for significance analysis of differentially expressed genes (DEGs) between liver cancer samples and normal samples. Gene Ontology (GO) term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, based on R software, were applied for the identification of pathways in which DEGs significantly enriched. Cytoscape software was for the construction of protein-protein interaction (PPI) network and module analysis to find the hub genes and key pathways. Finally, weighted correlation network analysis (WGCNA) was conducted to further screen critical gene modules with similar expression pattern and explore their biological significance. Significance analysis identified 1230 DEGs with fold change >2, including 632 significantly down-regulated DEGs and 598 significantly up-regulated DEGs. GO term enrichment analysis suggested that up-regulated DEG significantly enriched in immune response, cell adhesion, cell migration, type I interferon signaling pathway, and cell proliferation, and the down-regulated DEG mainly enriched in response to endoplasmic reticulum stress and endoplasmic reticulum unfolded protein response. KEGG pathway analysis found DEGs significantly enriched in five pathways including complement and coagulation cascades, focal adhesion, ECM-receptor interaction, antigen processing and presentation, and protein processing in endoplasmic reticulum. The top 10 hub genes in HCC were separately GMPS, ACACA, ALB, TGFB1, KRAS, ERBB2, BCL2, EGFR, STAT3, and CD8A, which resulted from PPI network. The top 3 gene interaction modules in PPI network enriched in immune response, organ development, and response to other organism, respectively. WGCNA revealed that the confirmed eight gene modules significantly enriched in monooxygenase and oxidoreductase activity, response to endoplasmic reticulum stress, type I interferon signaling pathway, processing, presentation and binding of peptide antigen, cellular response to cadmium and zinc ion, cell locomotion and differentiation, ribonucleoprotein complex and RNA processing, and immune system process, respectively. In conclusion, we identified some key genes and pathways closely related with HCC initiation and progression by a series of bioinformatics analysis on DEGs. These screened genes and pathways provided for a more detailed molecular mechanism underlying HCC occurrence and progression, holding promise for acting as biomarkers and potential therapeutic targets.
Tabei, Yasuo; Pauwels, Edouard; Stoven, Véronique; Takemoto, Kazuhiro; Yamanishi, Yoshihiro
2012-01-01
Motivation: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug–target interactions is crucial in the drug design process. Results: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug–target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L1 regularized classifiers over the tensor product space of possible drug–target pairs. It is shown that the proposed method can extract a very limited number of chemogenomic features without loosing the performance of predicting drug–target interactions and the extracted features are biologically meaningful. The extracted substructure–domain association network enables us to suggest ligand chemical fragments specific for each protein domain and ligand core substructures important for a wide range of protein families. Availability: Softwares are available at the supplemental website. Contact: yamanishi@bioreg.kyushu-u.ac.jp Supplementary Information: Datasets and all results are available at http://cbio.ensmp.fr/~yyamanishi/l1binary/ . PMID:22962471
Li, Dong-dong; He, Shao-heng
2004-07-01
To analyse the total proteins in the seeds of almond (Prunus dulcis), one of the popular ingestent allergens in China, by two-dimensional electrophoresis. The total proteins of the seeds were extracted by trichloracetic acid (TCA) method, and then separated by isoelectric focusing as first dimension and SDS-PAGE as the second dimension. The spots of proteins were visualized by staining with Coomassie Brilliant Blue R-250. After analysis with software (ImageMaster 2D), 188 different proteins were detected. The isoelectric points (pI) for approximately 28% of total proteins were between 4.5-5.5, and the relative molecular mass (M(r)) of approximately 62% total proteins were between (20-25)x10(3). This was the first high-resolution, two-dimensional protein map of the seed of almond (Prunus dulcis) in China. Our finding has laid a solid foundation for further identification, characterization, gene cloning and standardization of allergenic proteins in the seed of almond (Prunus dulcis).
Blind Pose Prediction, Scoring, and Affinity Ranking of the CSAR 2014 Dataset.
Martiny, Virginie Y; Martz, François; Selwa, Edithe; Iorga, Bogdan I
2016-06-27
The 2014 CSAR Benchmark Exercise was focused on three protein targets: coagulation factor Xa, spleen tyrosine kinase, and bacterial tRNA methyltransferase. Our protocol involved a preliminary analysis of the structural information available in the Protein Data Bank for the protein targets, which allowed the identification of the most appropriate docking software and scoring functions to be used for the rescoring of several docking conformations datasets, as well as for pose prediction and affinity ranking. The two key points of this study were (i) the prior evaluation of molecular modeling tools that are most adapted for each target and (ii) the increased search efficiency during the docking process to better explore the conformational space of big and flexible ligands.
NASA Astrophysics Data System (ADS)
Tambunan, U. S. F.; Nasution, M. A. F.
2017-07-01
Ebola remains as one of the deadliest diseases in the world, with almost 29,000 cases were reported and kill 11,000 of them, and yet neither treatment nor vaccine that can combat this disease effectively. This disease is caused by ebolavirus (EBOV), a primary member of Filoviridae family. The life cycle of this virus has been operated by several key proteins, one of them is VP24 protein, which has been known for its crucial role in the transcription and replication of EBOV. Therefore, targeting VP24 protein can be a solution for treating this pathogenic disease. In this study, virtual screening of Indonesian natural products as EBOV VP24 inhibitor was performed. About 2,020 ligands from many sources, including HerbalDB database, were obtained and screened by using DataWarrior software to measure its molecular and pharmacological properties, resulting 301 ligands in the process. Then, the molecular docking simulation was performed to check the ligand's binding interaction and affinity with EBOV VP24 protein; this simulation was done by using MOE 2014.09 software. This study resulted that cycloartocarpin was the best ligand to inhibit the EBOV VP24 protein. Therefore, this ligand should be checked its stability through molecular dynamics simulation and performed in vitro test to verify its bioactivity against the EBOV VP24 protein.
TUBEs-Mass Spectrometry for Identification and Analysis of the Ubiquitin-Proteome.
Azkargorta, Mikel; Escobes, Iraide; Elortza, Felix; Matthiesen, Rune; Rodríguez, Manuel S
2016-01-01
Mass spectrometry (MS) has become the method of choice for the large-scale analysis of protein ubiquitylation. There exist a number of proposed methods for mapping ubiquitin sites, each with different pros and cons. We present here a protocol for the MS analysis of the ubiquitin-proteome captured by TUBEs and subsequent data analysis. Using dedicated software and algorithms, specific information on the presence of ubiquitylated peptides can be obtained from the MS search results. In addition, a quantitative and functional analysis of the ubiquitylated proteins and their interacting partners helps to unravel the biological and molecular processes they are involved in.
Murugaiyan, Jayaseelan; Eravci, Murat; Weise, Christoph; Roesler, Uwe
2016-01-01
Microalgae of the genus Prototheca (P.) spp are associated with rare algal infections of invertebrates termed protothecosis. Among the seven generally accepted species, P. zopfii genotype 2 (GT2) is associated with a severe form of bovine mastitis while P. blaschkeae causes the mild and sub-clinical form of mastitis. The reason behind the infectious nature of P. zopfii GT2, while genotype 1 (GT1) remains non-infectious, is not known. Therefore, in the present study we investigated the protein expression level difference between the genotypes of P. zopfii and P. blaschkeae. Cells were cultured to the mid-exponential phase, harvested, and processed for LC-MS analysis. Peptide data was acquired on an LTQ Orbitrap Velos, raw spectra were quantitatively analyzed with MaxQuant software and matching with the reference database of Chlorella variabilis and Auxenochlorella protothecoides resulted in the identification of 226 proteins. Comparison of an environmental strain with infectious strains resulted in the identification of 51 differentially expressed proteins related to carbohydrate metabolism, energy production and protein translation. The expression level of Hsp70 proteins and their role in the infectious process is worth further investigation. All mass spectrometry data are available via ProteomeXchange with identifier PXD005305. PMID:28036087
Jeon, Jouhyun; Arnold, Roland; Singh, Fateh; Teyra, Joan; Braun, Tatjana; Kim, Philip M
2016-04-01
The identification of structured units in a protein sequence is an important first step for most biochemical studies. Importantly for this study, the identification of stable structured region is a crucial first step to generate novel synthetic antibodies. While many approaches to find domains or predict structured regions exist, important limitations remain, such as the optimization of domain boundaries and the lack of identification of non-domain structured units. Moreover, no integrated tool exists to find and optimize structural domains within protein sequences. Here, we describe a new tool, PAT ( http://www.kimlab.org/software/pat ) that can efficiently identify both domains (with optimized boundaries) and non-domain putative structured units. PAT automatically analyzes various structural properties, evaluates the folding stability, and reports possible structural domains in a given protein sequence. For reliability evaluation of PAT, we applied PAT to identify antibody target molecules based on the notion that soluble and well-defined protein secondary and tertiary structures are appropriate target molecules for synthetic antibodies. PAT is an efficient and sensitive tool to identify structured units. A performance analysis shows that PAT can characterize structurally well-defined regions in a given sequence and outperforms other efforts to define reliable boundaries of domains. Specially, PAT successfully identifies experimentally confirmed target molecules for antibody generation. PAT also offers the pre-calculated results of 20,210 human proteins to accelerate common queries. PAT can therefore help to investigate large-scale structured domains and improve the success rate for synthetic antibody generation.
Alternatives for jet engine control
NASA Technical Reports Server (NTRS)
Sain, M. K.
1983-01-01
Tensor model order reduction, recursive tensor model identification, input design for tensor model identification, software development for nonlinear feedback control laws based upon tensors, and development of the CATNAP software package for tensor modeling, identification and simulation were studied. The last of these are discussed.
PACSY, a relational database management system for protein structure and chemical shift analysis.
Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L
2012-10-01
PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
Yahyavi, Masoumeh; Falsafi-Zadeh, Sajad; Karimi, Zahra; Kalatarian, Giti; Galehdari, Hamid
2014-01-01
The investigation on the types of secondary structure (SS) of a protein is important. The evolution of secondary structures during molecular dynamics simulations is a useful parameter to analyze protein structures. Therefore, it is of interest to describe VMD-SS (a software program) for the identification of secondary structure elements and its trajectories during simulation for known structures available at the Protein Data Bank (PDB). The program helps to calculate (1) percentage SS, (2) SS occurrence in each residue, (3) percentage SS during simulation, and (4) percentage residues in all SS types during simulation. The VMD-SS plug-in was designed using TCL script and stride to calculate secondary structure features. The database is available for free at http://science.scu.ac.ir/HomePage.aspx?TabID=13755.
Casey, Tammy M; Khan, Javed M; Bringans, Scott D; Koudelka, Tomas; Takle, Pari S; Downs, Rachael A; Livk, Andreja; Syme, Robert A; Tan, Kar-Chun; Lipscombe, Richard J
2017-02-03
This study aimed to compare the depth and reproducibility of total proteome and differentially expressed protein coverage in technical duplicates and triplicates using iTRAQ 4-plex, iTRAQ 8-plex, and TMT 6-plex reagents. The analysis was undertaken because comprehensive comparisons of isobaric mass tag reproducibility have not been widely reported in the literature. The highest number of proteins was identified with 4-plex, followed by 8-plex and then 6-plex reagents. Quantitative analyses revealed that more differentially expressed proteins were identified with 4-plex reagents than 8-plex reagents and 6-plex reagents. Replicate reproducibility was determined to be ≥69% for technical duplicates and ≥57% for technical triplicates. The results indicate that running an 8-plex or 6-plex experiment instead of a 4-plex experiment resulted in 26 or 39% fewer protein identifications, respectively. When 4-plex spectra were searched with three software tools-ProteinPilot, Mascot, and Proteome Discoverer-the highest number of protein identifications were obtained with Mascot. The analysis of negative controls demonstrated the importance of running experiments as replicates. Overall, this study demonstrates the advantages of using iTRAQ 4-plex reagents over iTRAQ 8-plex and TMT 6-plex reagents, provides estimates of technical duplicate and triplicate reproducibility, and emphasizes the value of running replicate samples.
Scientific Workflow Management in Proteomics
de Bruin, Jeroen S.; Deelder, André M.; Palmblad, Magnus
2012-01-01
Data processing in proteomics can be a challenging endeavor, requiring extensive knowledge of many different software packages, all with different algorithms, data format requirements, and user interfaces. In this article we describe the integration of a number of existing programs and tools in Taverna Workbench, a scientific workflow manager currently being developed in the bioinformatics community. We demonstrate how a workflow manager provides a single, visually clear and intuitive interface to complex data analysis tasks in proteomics, from raw mass spectrometry data to protein identifications and beyond. PMID:22411703
Rapid development of Proteomic applications with the AIBench framework.
López-Fernández, Hugo; Reboiro-Jato, Miguel; Glez-Peña, Daniel; Méndez Reboredo, José R; Santos, Hugo M; Carreira, Ricardo J; Capelo-Martínez, José L; Fdez-Riverola, Florentino
2011-09-15
In this paper we present two case studies of Proteomics applications development using the AIBench framework, a Java desktop application framework mainly focused in scientific software development. The applications presented in this work are Decision Peptide-Driven, for rapid and accurate protein quantification, and Bacterial Identification, for Tuberculosis biomarker search and diagnosis. Both tools work with mass spectrometry data, specifically with MALDI-TOF spectra, minimizing the time required to process and analyze the experimental data. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.
Izuchi, Yukari; Takashima, Tsuneo; Hatano, Naoya
2016-01-01
The demand for leather goods has grown globally in recent years. Industry revenue is forecast to reach $91.2 billion by 2018. There is an ongoing labelling problem in the leather items market, in that it is currently impossible to identify the species that a given piece of leather is derived from. To address this issue, we developed a rapid and simple method for the specific identification of leather derived from cattle, horses, pigs, sheep, goats, and deer by analysing peptides produced by the trypsin-digestion of proteins contained in leather goods using liquid chromatography/mass spectrometry. We determined species-specific amino acid sequences by liquid chromatography/tandem mass spectrometry analysis using the Mascot software program and demonstrated that collagen α-1(I), collagen α-2(I), and collagen α-1(III) from the dermal layer of the skin are particularly useful in species identification. PMID:27313979
Li, Xiao-jun; Yi, Eugene C; Kemp, Christopher J; Zhang, Hui; Aebersold, Ruedi
2005-09-01
There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics.
PACSY, a relational database management system for protein structure and chemical shift analysis
Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo
2012-01-01
PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
Software For Computer-Security Audits
NASA Technical Reports Server (NTRS)
Arndt, Kate; Lonsford, Emily
1994-01-01
Information relevant to potential breaches of security gathered efficiently. Automated Auditing Tools for VAX/VMS program includes following automated software tools performing noted tasks: Privileged ID Identification, program identifies users and their privileges to circumvent existing computer security measures; Critical File Protection, critical files not properly protected identified; Inactive ID Identification, identifications of users no longer in use found; Password Lifetime Review, maximum lifetimes of passwords of all identifications determined; and Password Length Review, minimum allowed length of passwords of all identifications determined. Written in DEC VAX DCL language.
Ravindranath, Pradeep Anand; Sanner, Michel F.
2016-01-01
Motivation: The identification of ligand-binding sites from a protein structure facilitates computational drug design and optimization, and protein function assignment. We introduce AutoSite: an efficient software tool for identifying ligand-binding sites and predicting pseudo ligand corresponding to each binding site identified. Binding sites are reported as clusters of 3D points called fills in which every point is labelled as hydrophobic or as hydrogen bond donor or acceptor. From these fills AutoSite derives feature points: a set of putative positions of hydrophobic-, and hydrogen-bond forming ligand atoms. Results: We show that AutoSite identifies ligand-binding sites with higher accuracy than other leading methods, and produces fills that better matches the ligand shape and properties, than the fills obtained with a software program with similar capabilities, AutoLigand. In addition, we demonstrate that for the Astex Diverse Set, the feature points identify 79% of hydrophobic ligand atoms, and 81% and 62% of the hydrogen acceptor and donor hydrogen ligand atoms interacting with the receptor, and predict 81.2% of water molecules mediating interactions between ligand and receptor. Finally, we illustrate potential uses of the predicted feature points in the context of lead optimization in drug discovery projects. Availability and Implementation: http://adfr.scripps.edu/AutoDockFR/autosite.html Contact: sanner@scripps.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27354702
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kangas, Lars J.; Metz, Thomas O.; Isaac, Georgis
2012-05-15
Liquid chromatography-mass spectrometry-based metabolomics has gained importance in the life sciences, yet it is not supported by software tools for high throughput identification of metabolites based on their fragmentation spectra. An algorithm (ISIS: in silico identification software) and its implementation are presented and show great promise in generating in silico spectra of lipids for the purpose of structural identification. Instead of using chemical reaction rate equations or rules-based fragmentation libraries, the algorithm uses machine learning to find accurate bond cleavage rates in a mass spectrometer employing collision-induced dissocia-tion tandem mass spectrometry. A preliminary test of the algorithm with 45 lipidsmore » from a subset of lipid classes shows both high sensitivity and specificity.« less
Nett, Isabelle R E; Martin, David M A; Miranda-Saavedra, Diego; Lamont, Douglas; Barber, Jonathan D; Mehlert, Angela; Ferguson, Michael A J
2009-07-01
The protozoan parasite Trypanosoma brucei is the causative agent of human African sleeping sickness and related animal diseases, and it has over 170 predicted protein kinases. Protein phosphorylation is a key regulatory mechanism for cellular function that, thus far, has been studied in T.brucei principally through putative kinase mRNA knockdown and observation of the resulting phenotype. However, despite the relatively large kinome of this organism and the demonstrated essentiality of several T. brucei kinases, very few specific phosphorylation sites have been determined in this organism. Using a gel-free, phosphopeptide enrichment-based proteomics approach we performed the first large scale phosphorylation site analyses for T.brucei. Serine, threonine, and tyrosine phosphorylation sites were determined for a cytosolic protein fraction of the bloodstream form of the parasite, resulting in the identification of 491 phosphoproteins based on the identification of 852 unique phosphopeptides and 1204 phosphorylation sites. The phosphoproteins detected in this study are predicted from their genome annotations to participate in a wide variety of biological processes, including signal transduction, processing of DNA and RNA, protein synthesis, and degradation and to a minor extent in metabolic pathways. The analysis of phosphopeptides and phosphorylation sites was facilitated by in-house developed software, and this automated approach was validated by manual annotation of spectra of the kinase subset of proteins. Analysis of the cytosolic bloodstream form T. brucei kinome revealed the presence of 44 phosphorylated protein kinases in our data set that could be classified into the major eukaryotic protein kinase groups by applying a multilevel hidden Markov model library of the kinase catalytic domain. Identification of the kinase phosphorylation sites showed conserved phosphorylation sequence motifs in several kinase activation segments, supporting the view that phosphorylation-based signaling is a general and fundamental regulatory process that extends to this highly divergent lower eukaryote.
Usability study of clinical exome analysis software: top lessons learned and recommendations.
Shyr, Casper; Kushniruk, Andre; Wasserman, Wyeth W
2014-10-01
New DNA sequencing technologies have revolutionized the search for genetic disruptions. Targeted sequencing of all protein coding regions of the genome, called exome analysis, is actively used in research-oriented genetics clinics, with the transition to exomes as a standard procedure underway. This transition is challenging; identification of potentially causal mutation(s) amongst ∼10(6) variants requires specialized computation in combination with expert assessment. This study analyzes the usability of user interfaces for clinical exome analysis software. There are two study objectives: (1) To ascertain the key features of successful user interfaces for clinical exome analysis software based on the perspective of expert clinical geneticists, (2) To assess user-system interactions in order to reveal strengths and weaknesses of existing software, inform future design, and accelerate the clinical uptake of exome analysis. Surveys, interviews, and cognitive task analysis were performed for the assessment of two next-generation exome sequence analysis software packages. The subjects included ten clinical geneticists who interacted with the software packages using the "think aloud" method. Subjects' interactions with the software were recorded in their clinical office within an urban research and teaching hospital. All major user interface events (from the user interactions with the packages) were time-stamped and annotated with coding categories to identify usability issues in order to characterize desired features and deficiencies in the user experience. We detected 193 usability issues, the majority of which concern interface layout and navigation, and the resolution of reports. Our study highlights gaps in specific software features typical within exome analysis. The clinicians perform best when the flow of the system is structured into well-defined yet customizable layers for incorporation within the clinical workflow. The results highlight opportunities to dramatically accelerate clinician analysis and interpretation of patient genomic data. We present the first application of usability methods to evaluate software interfaces in the context of exome analysis. Our results highlight how the study of user responses can lead to identification of usability issues and challenges and reveal software reengineering opportunities for improving clinical next-generation sequencing analysis. While the evaluation focused on two distinctive software tools, the results are general and should inform active and future software development for genome analysis software. As large-scale genome analysis becomes increasingly common in healthcare, it is critical that efficient and effective software interfaces are provided to accelerate clinical adoption of the technology. Implications for improved design of such applications are discussed. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Clustering and Network Analysis of Reverse Phase Protein Array Data.
Byron, Adam
2017-01-01
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
LC-MSsim – a simulation software for liquid chromatography mass spectrometry data
Schulz-Trieglaff, Ole; Pfeifer, Nico; Gröpl, Clemens; Kohlbacher, Oliver; Reinert, Knut
2008-01-01
Background Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools. PMID:18842122
Fricke, Jens; Pohlmann, Kristof; Jonescheit, Nils A; Ellert, Andree; Joksch, Burkhard; Luttmann, Reiner
2013-06-01
The identification of optimal expression conditions for state-of-the-art production of pharmaceutical proteins is a very time-consuming and expensive process. In this report a method for rapid and reproducible optimization of protein expression in an in-house designed small-scale BIOSTAT® multi-bioreactor plant is described. A newly developed BioPAT® MFCS/win Design of Experiments (DoE) module (Sartorius Stedim Systems, Germany) connects the process control system MFCS/win and the DoE software MODDE® (Umetrics AB, Sweden) and enables therefore the implementation of fully automated optimization procedures. As a proof of concept, a commercial Pichia pastoris strain KM71H has been transformed for the expression of potential malaria vaccines. This approach has allowed a doubling of intact protein secretion productivity due to the DoE optimization procedure compared to initial cultivation results. In a next step, robustness regarding the sensitivity to process parameter variability has been proven around the determined optimum. Thereby, a pharmaceutical production process that is significantly improved within seven 24-hour cultivation cycles was established. Specifically, regarding the regulatory demands pointed out in the process analytical technology (PAT) initiative of the United States Food and Drug Administration (FDA), the combination of a highly instrumented, fully automated multi-bioreactor platform with proper cultivation strategies and extended DoE software solutions opens up promising benefits and opportunities for pharmaceutical protein production. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan Mhai; Mondol, Sobuj; Ahmed, C M Sabbir
2014-01-01
Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim-sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein-protein interaction. In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis.
Beyond the proteome: Mass Spectrometry Special Interest Group (MS-SIG) at ISMB/ECCB 2013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryu, Soyoung; Payne, Samuel H.; Schaab, Christoph
2014-07-02
Mass spectrometry special interest group (MS-SIG) aims to bring together experts from the global research community to discuss highlights and challenges in the field of mass spectrometry (MS)-based proteomics and computational biology. The rapid echnological developments in MS-based proteomics have enabled the generation of a large amount of meaningful information on hundreds to thousands of proteins simultaneously from a biological sample; however, the complexity of the MS data require sophisticated computational algorithms and software for data analysis and interpretation. This year’s MS-SIG meeting theme was ‘Beyond the Proteome’ with major focuses on improving protein identification/quantification and using proteomics data tomore » solve interesting problems in systems biology and clinical research.« less
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.
Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc
2016-01-01
Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.
Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics.
Kelstrup, Christian D; Bekker-Jensen, Dorte B; Arrey, Tabiwang N; Hogrebe, Alexander; Harder, Alexander; Olsen, Jesper V
2018-01-05
Progress in proteomics is mainly driven by advances in mass spectrometric (MS) technologies. Here we benchmarked the performance of the latest MS instrument in the benchtop Orbitrap series, the Q Exactive HF-X, against its predecessor for proteomics applications. A new peak-picking algorithm, a brighter ion source, and optimized ion transfers enable productive MS/MS acquisition above 40 Hz at 7500 resolution. The hardware and software improvements collectively resulted in improved peptide and protein identifications across all comparable conditions, with an increase of up to 50 percent at short LC-MS gradients, yielding identification rates of more than 1000 unique peptides per minute. Alternatively, the Q Exactive HF-X is capable of achieving the same proteome coverage as its predecessor in approximately half the gradient time or at 10-fold lower sample loads. The Q Exactive HF-X also enables rapid phosphoproteomics with routine analysis of more than 5000 phosphopeptides with short single-shot 15 min LC-MS/MS measurements, or 16 700 phosphopeptides quantified across ten conditions in six gradient hours using TMT10-plex and offline peptide fractionation. Finally, exciting perspectives for data-independent acquisition are highlighted with reproducible identification of 55 000 unique peptides covering 5900 proteins in half an hour of MS analysis.
PGCA: An algorithm to link protein groups created from MS/MS data
Sasaki, Mayu; Hollander, Zsuzsanna; Smith, Derek; McManus, Bruce; McMaster, W. Robert; Ng, Raymond T.; Cohen Freue, Gabriela V.
2017-01-01
The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups’ identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs. PMID:28562641
Chowdhury, Md Rabiul Hossain; Bhuiyan, Md IqbalKaiser; Saha, Ayan; Mosleh, Ivan MHAI; Mondol, Sobuj; Ahmed, C M Sabbir
2014-01-01
Purpose Streptococcus sanguinis is a Gram-positive, facultative aerobic bacterium that is a member of the viridans streptococcus group. It is found in human mouths in dental plaque, which accounts for both dental cavities and bacterial endocarditis, and which entails a mortality rate of 25%. Although a range of remedial mediators have been found to control this organism, the effectiveness of agents such as penicillin, amoxicillin, trimethoprim–sulfamethoxazole, and erythromycin, was observed. The emphasis of this investigation was on finding substitute and efficient remedial approaches for the total destruction of this bacterium. Materials and methods In this computational study, various databases and online software were used to ascertain some specific targets of S. sanguinis. Particularly, the Kyoto Encyclopedia of Genes and Genomes databases were applied to determine human nonhomologous proteins, as well as the metabolic pathways involved with those proteins. Different software such as Phyre2, CastP, DoGSiteScorer, the Protein Function Predictor server, and STRING were utilized to evaluate the probable active drug binding site with its known function and protein–protein interaction. Results In this study, among 218 essential proteins of this pathogenic bacterium, 81 nonhomologous proteins were accrued, and 15 proteins that are unique in several metabolic pathways of S. sanguinis were isolated through metabolic pathway analysis. Furthermore, four essentially membrane-bound unique proteins that are involved in distinct metabolic pathways were revealed by this research. Active sites and druggable pockets of these selected proteins were investigated with bioinformatic techniques. In addition, this study also mentions the activity of those proteins, as well as their interactions with the other proteins. Conclusion Our findings helped to identify the type of protein to be considered as an efficient drug target. This study will pave the way for researchers to develop and discover more effective and specific therapeutic agents against S. sanguinis. PMID:25473301
RECOVIR Software for Identifying Viruses
NASA Technical Reports Server (NTRS)
Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui
2013-01-01
Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.
Diehl, Hanna C; Beine, Birte; Elm, Julian; Trede, Dennis; Ahrens, Maike; Eisenacher, Martin; Marcus, Katrin; Meyer, Helmut E; Henkel, Corinna
2015-03-01
Mass spectrometry imaging (MSI) has become a powerful and successful tool in the context of biomarker detection especially in recent years. This emerging technique is based on the combination of histological information of a tissue and its corresponding spatial resolved mass spectrometric information. The identification of differentially expressed protein peaks between samples is still the method's bottleneck. Therefore, peptide MSI compared to protein MSI is closer to the final goal of identification since peptides are easier to measure than proteins. Nevertheless, the processing of peptide imaging samples is challenging due to experimental complexity. To address this issue, a method development study for peptide MSI using cryoconserved and formalin-fixed paraffin-embedded (FFPE) rat brain tissue is provided. Different digestion times, matrices, and proteases were tested to define an optimal workflow for peptide MSI. All practical experiments were done in triplicates and analyzed by the SCiLS Lab software, using structures derived from myelin basic protein (MBP) peaks, principal component analysis (PCA) and probabilistic latent semantic analysis (pLSA) to rate the experiments' quality. Blinded experimental evaluation in case of defining countable structures in the datasets was performed by three individuals. Such an extensive method development for peptide matrix-assisted laser desorption/ionization (MALDI) imaging experiments has not been performed so far, and the resulting problems and consequences were analyzed and discussed.
Thiele, Herbert; Glandorf, Jörg; Hufnagel, Peter
2010-05-27
With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis software available, researchers today face major challenges validating and comparing their Proteomics data. Here we present a new generation of the ProteinScape bioinformatics platform, now enabling researchers to manage Proteomics data from the generation and data warehousing to a central data repository with a strong focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It addresses scientists; current needs in proteomics identification, quantification and validation. But producing large protein lists is not the end point in Proteomics, where one ultimately aims to answer specific questions about the biological condition or disease model of the analyzed sample. In this context, a new tool has been developed at the Spanish Centro Nacional de Biotecnologia Proteomics Facility termed PIKE (Protein information and Knowledge Extractor) that allows researchers to control, filter and access specific information from genomics and proteomic databases, to understand the role and relationships of the proteins identified in the experiments. Additionally, an EU funded project, ProDac, has coordinated systematic data collection in public standards-compliant repositories like PRIDE. This will cover all aspects from generating MS data in the laboratory, assembling the whole annotation information and storing it together with identifications in a standardised format.
RAPSearch: a fast protein similarity search tool for short reads
2011-01-01
Background Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. PMID:21575167
Wang, Cong; Tu, Maolin; Wu, Di; Chen, Hui; Chen, Cheng; Wang, Zhenyu; Jiang, Lianzhou
2018-04-11
In the present study, a novel angiotensin I-converting enzyme inhibitory (ACE inhibitory) peptide, EPNGLLLPQY, derived from walnut seed storage protein, fragment residues 80-89, was identified by ultra-high performance liquid chromatography electrospray ionization quadrupole time of flight mass spectrometry (UPLC-ESI-Q-TOF-MS/MS) from walnut protein hydrolysate. The IC 50 value of the peptide was 233.178 μM, which was determined by the high performance liquid chromatography method by measuring the amount of hippuric acid (HA) generated from the ACE decomposition substrate (hippuryl-l-histidyl-l-leucine (HHL) to assess the ACE activity. Enzyme inhibitory kinetics of the peptide against ACE were also conducted, by which the inhibitory mechanism of ACE-inhibitory peptide was confirmed. Moreover, molecular docking was simulated by Discovery Studio 2017 R2 software to provide the potential mechanisms underlying the ACE-inhibitory activity of EPNGLLLPQY.
Witzel, Katja; Surabhi, Giridara-Kumar; Jyothsnakumari, Gottimukkala; Sudhakar, Chinta; Matros, Andrea; Mock, Hans-Peter
2007-04-01
This paper describes the application of the recently introduced fluorescence stain Ruthenium(II)-tris-(bathophenanthroline-disulphonate) (RuBP) on a comparative proteome analysis of two phenotypically different barley lines. We carried out an analysis of protein patterns from 2-D gels of the parental lines of the Oregon Wolfe Barley mapping population DOM and REC and stained with either the conventional colloidal Coomassie Brilliant Blue (cCBB) or with the novel RuBP solution. We wished to experimentally verify the usefulness of such a stain in evaluating the complex pattern of a seed proteome, in comparison to the previously used cCBB staining technique. To validate the efficiency of visualization by both stains, we first compared the overall number of detected protein spots. On average, 790 spots were visible by cCBB staining and 1200 spots by RuBP staining. Then, the intensity of a set of spots was assessed, and changes in relative abundance were determined using image analysis software. As expected, staining with RuBP performed better in quantitation in terms of sensitivity and dynamic range. Furthermore, spots from a cultivar-specific region in the protein map were chosen for identification to asses the gain of biological information due to the staining procedure. From this particular region, eight spots were visualized exclusively by RuBP and identification was successful for all spots, proving the ability to identify even very low abundant proteins. Performance in MS analysis was comparable for both protein stains. Proteins were identified by MALDI-TOF MS peptide mass fingerprinting. This approach was not successful for all spots, due to the restricted entry number for barley in the database. Therefore, we subsequently used LC-ESI-Q-TOF MS/MS and de novo sequencing for identification. Because only an insufficient number of proteins from barley is annotated, an EST-based identification strategy was chosen for our experiment. We wished to test whether under these limitations the application of a more sensitive stain would lead to a more advanced proteome approach. In summary, we demonstrate here that the application of RuBP as an economical but reliable and sensitive fluorescence stain is highly suitable for quantitative proteome analysis of plant seeds.
Koelmel, Jeremy P; Kroeger, Nicholas M; Ulmer, Candice Z; Bowden, John A; Patterson, Rainey E; Cochran, Jason A; Beecher, Christopher W W; Garrett, Timothy J; Yost, Richard A
2017-07-10
Lipids are ubiquitous and serve numerous biological functions; thus lipids have been shown to have great potential as candidates for elucidating biomarkers and pathway perturbations associated with disease. Methods expanding coverage of the lipidome increase the likelihood of biomarker discovery and could lead to more comprehensive understanding of disease etiology. We introduce LipidMatch, an R-based tool for lipid identification for liquid chromatography tandem mass spectrometry workflows. LipidMatch currently has over 250,000 lipid species spanning 56 lipid types contained in in silico fragmentation libraries. Unique fragmentation libraries, compared to other open source software, include oxidized lipids, bile acids, sphingosines, and previously uncharacterized adducts, including ammoniated cardiolipins. LipidMatch uses rule-based identification. For each lipid type, the user can select which fragments must be observed for identification. Rule-based identification allows for correct annotation of lipids based on the fragments observed, unlike typical identification based solely on spectral similarity scores, where over-reporting structural details that are not conferred by fragmentation data is common. Another unique feature of LipidMatch is ranking lipid identifications for a given feature by the sum of fragment intensities. For each lipid candidate, the intensities of experimental fragments with exact mass matches to expected in silico fragments are summed. The lipid identifications with the greatest summed intensity using this ranking algorithm were comparable to other lipid identification software annotations, MS-DIAL and Greazy. For example, for features with identifications from all 3 software, 92% of LipidMatch identifications by fatty acyl constituents were corroborated by at least one other software in positive mode and 98% in negative ion mode. LipidMatch allows users to annotate lipids across a wide range of high resolution tandem mass spectrometry experiments, including imaging experiments, direct infusion experiments, and experiments employing liquid chromatography. LipidMatch leverages the most extensive in silico fragmentation libraries of freely available software. When integrated into a larger lipidomics workflow, LipidMatch may increase the probability of finding lipid-based biomarkers and determining etiology of disease by covering a greater portion of the lipidome and using annotation which does not over-report biologically relevant structural details of identified lipid molecules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ecale Zhou, C L; Zemla, A T; Roe, D
2005-01-29
Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers, or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set ofmore » ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context, and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics, and vaccines.« less
FragIdent--automatic identification and characterisation of cDNA-fragments.
Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin
2009-03-02
Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
Żak, Mariusz; Zaborowski, Piotr; Baczewska-Rej, Milena; Zasada, Aleksandra A; Matuszewska, Renata; Krogulska, Bożena
2011-12-20
For the last five years, Legionella sp. infections and legionnaire's disease in Poland have been receiving a lot of attention, because of the new regulations concerning microbiological quality of drinking water. This was the inspiration to search for and develop a new assay to identify many virulence genes of Legionella pneumophila to better understand their distribution in environmental and clinical strains. The method might be an invaluable help in infection risk assessment and in epidemiological investigations. The microarray is based on Array Tube technology. It contains 3 positive and 1 negative control. Target genes encode structural elements of T4SS, effector proteins and factors not related to T4SS. Probes were designed using OligoWiz software and data analyzed using IconoClust software. To isolate environmental and clinical strains, BAL samples and samples of hot water from different and independent hot water distribution systems of public utility buildings were collected. We have developed a miniaturized DNA microarray for identification of 66 virulence genes of L. pneumophila. The assay is specific to L. pneumophila sg 1 with sensitivity sufficient to perform the assay using DNA isolated from a single L. pneumophila colony. Seven environmental strains were analyzed. Two exhibited a hybridization pattern distinct from the reference strain. The method is time- and cost-effective. Initial studies have shown that genes encoding effector proteins may vary among environmental strains. Further studies might help to identify set of genes increasing the risk of clinical disease and to determine the pathogenic potential of environmental strains.
Pfleger, Christopher; Rathi, Prakash Chandra; Klein, Doris L; Radestock, Sebastian; Gohlke, Holger
2013-04-22
For deriving maximal advantage from information on biomacromolecular flexibility and rigidity, results from rigidity analyses must be linked to biologically relevant characteristics of a structure. Here, we describe the Python-based software package Constraint Network Analysis (CNA) developed for this task. CNA functions as a front- and backend to the graph-based rigidity analysis software FIRST. CNA goes beyond the mere identification of flexible and rigid regions in a biomacromolecule in that it (I) provides a refined modeling of thermal unfolding simulations that also considers the temperature-dependence of hydrophobic tethers, (II) allows performing rigidity analyses on ensembles of network topologies, either generated from structural ensembles or by using the concept of fuzzy noncovalent constraints, and (III) computes a set of global and local indices for quantifying biomacromolecular stability. This leads to more robust results from rigidity analyses and extends the application domain of rigidity analyses in that phase transition points ("melting points") and unfolding nuclei ("structural weak spots") are determined automatically. Furthermore, CNA robustly handles small-molecule ligands in general. Such advancements are important for applying rigidity analysis to data-driven protein engineering and for estimating the influence of ligand molecules on biomacromolecular stability. CNA maintains the efficiency of FIRST such that the analysis of a single protein structure takes a few seconds for systems of several hundred residues on a single core. These features make CNA an interesting tool for linking biomacromolecular structure, flexibility, (thermo-)stability, and function. CNA is available from http://cpclab.uni-duesseldorf.de/software for nonprofit organizations.
Automated protein NMR structure determination using wavelet de-noised NOESY spectra.
Dancea, Felician; Günther, Ulrich
2005-11-01
A major time-consuming step of protein NMR structure determination is the generation of reliable NOESY cross peak lists which usually requires a significant amount of manual interaction. Here we present a new algorithm for automated peak picking involving wavelet de-noised NOESY spectra in a process where the identification of peaks is coupled to automated structure determination. The core of this method is the generation of incremental peak lists by applying different wavelet de-noising procedures which yield peak lists of a different noise content. In combination with additional filters which probe the consistency of the peak lists, good convergence of the NOESY-based automated structure determination could be achieved. These algorithms were implemented in the context of the ARIA software for automated NOE assignment and structure determination and were validated for a polysulfide-sulfur transferase protein of known structure. The procedures presented here should be commonly applicable for efficient protein NMR structure determination and automated NMR peak picking.
Model Transformation for a System of Systems Dependability Safety Case
NASA Technical Reports Server (NTRS)
Murphy, Judy; Driskell, Stephen B.
2010-01-01
Software plays an increasingly larger role in all aspects of NASA's science missions. This has been extended to the identification, management and control of faults which affect safety-critical functions and by default, the overall success of the mission. Traditionally, the analysis of fault identification, management and control are hardware based. Due to the increasing complexity of system, there has been a corresponding increase in the complexity in fault management software. The NASA Independent Validation & Verification (IV&V) program is creating processes and procedures to identify, and incorporate safety-critical software requirements along with corresponding software faults so that potential hazards may be mitigated. This Specific to Generic ... A Case for Reuse paper describes the phases of a dependability and safety study which identifies a new, process to create a foundation for reusable assets. These assets support the identification and management of specific software faults and, their transformation from specific to generic software faults. This approach also has applications to other systems outside of the NASA environment. This paper addresses how a mission specific dependability and safety case is being transformed to a generic dependability and safety case which can be reused for any type of space mission with an emphasis on software fault conditions.
Software-implemented fault insertion: An FTMP example
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1987-01-01
This report presents a model for fault insertion through software; describes its implementation on a fault-tolerant computer, FTMP; presents a summary of fault detection, identification, and reconfiguration data collected with software-implemented fault insertion; and compares the results to hardware fault insertion data. Experimental results show detection time to be a function of time of insertion and system workload. For the fault detection time, there is no correlation between software-inserted faults and hardware-inserted faults; this is because hardware-inserted faults must manifest as errors before detection, whereas software-inserted faults immediately exercise the error detection mechanisms. In summary, the software-implemented fault insertion is able to be used as an evaluation technique for the fault-handling capabilities of a system in fault detection, identification and recovery. Although the software-inserted faults do not map directly to hardware-inserted faults, experiments show software-implemented fault insertion is capable of emulating hardware fault insertion, with greater ease and automation.
2013-01-01
Background Subunit vaccines based on recombinant proteins have been effective in preventing infectious diseases and are expected to meet the demands of future vaccine development. Computational approach, especially reverse vaccinology (RV) method has enormous potential for identification of protein vaccine candidates (PVCs) from a proteome. The existing protective antigen prediction software and web servers have low prediction accuracy leading to limited applications for vaccine development. Besides machine learning techniques, those software and web servers have considered only protein’s adhesin-likeliness as criterion for identification of PVCs. Several non-adhesin functional classes of proteins involved in host-pathogen interactions and pathogenesis are known to provide protection against bacterial infections. Therefore, knowledge of bacterial pathogenesis has potential to identify PVCs. Results A web server, Jenner-Predict, has been developed for prediction of PVCs from proteomes of bacterial pathogens. The web server targets host-pathogen interactions and pathogenesis by considering known functional domains from protein classes such as adhesin, virulence, invasin, porin, flagellin, colonization, toxin, choline-binding, penicillin-binding, transferring-binding, fibronectin-binding and solute-binding. It predicts non-cytosolic proteins containing above domains as PVCs. It also provides vaccine potential of PVCs in terms of their possible immunogenicity by comparing with experimentally known IEDB epitopes, absence of autoimmunity and conservation in different strains. Predicted PVCs are prioritized so that only few prospective PVCs could be validated experimentally. The performance of web server was evaluated against known protective antigens from diverse classes of bacteria reported in Protegen database and datasets used for VaxiJen server development. The web server efficiently predicted known vaccine candidates reported from Streptococcus pneumoniae and Escherichia coli proteomes. The Jenner-Predict server outperformed NERVE, Vaxign and VaxiJen methods. It has sensitivity of 0.774 and 0.711 for Protegen and VaxiJen dataset, respectively while specificity of 0.940 has been obtained for the latter dataset. Conclusions Better prediction accuracy of Jenner-Predict web server signifies that domains involved in host-pathogen interactions and pathogenesis are better criteria for prediction of PVCs. The web server has successfully predicted maximum known PVCs belonging to different functional classes. Jenner-Predict server is freely accessible at http://117.211.115.67/vaccine/home.html PMID:23815072
Barakat, Mohamed; Ortet, Philippe; Whitworth, David E
2013-04-20
Regulatory proteins (RPs) such as transcription factors (TFs) and two-component system (TCS) proteins control how prokaryotic cells respond to changes in their external and/or internal state. Identification and annotation of TFs and TCSs is non-trivial, and between-genome comparisons are often confounded by different standards in annotation. There is a need for user-friendly, fast and convenient tools to allow researchers to overcome the inherent variability in annotation between genome sequences. We have developed the web-server P2RP (Predicted Prokaryotic Regulatory Proteins), which enables users to identify and annotate TFs and TCS proteins within their sequences of interest. Users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or TCS domains. RPs identified in this manner are categorised into families, unambiguously annotated, and a detailed description of their features generated, using an integrated software pipeline. P2RP results can then be outputted in user-specified formats. Biologists have an increasing need for fast and intuitively usable tools, which is why P2RP has been developed as an interactive system. As well as assisting experimental biologists to interrogate novel sequence data, it is hoped that P2RP will be built into genome annotation pipelines and re-annotation processes, to increase the consistency of RP annotation in public genomic sequences. P2RP is the first publicly available tool for predicting and analysing RP proteins in users' sequences. The server is freely available and can be accessed along with documentation at http://www.p2rp.org.
Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra
2016-01-01
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Calderón-González, Karla Grisel; Valero Rustarazo, Ma Luz; Labra-Barrios, Maria Luisa; Bazán-Méndez, César Isaac; Tavera-Tapia, Alejandra; Herrera-Aguirre, Marí;aEsther; Sánchez del Pino, Manuel M.; Gallegos-Pérez, José Luis; González-Márquez, Humberto; Hernández-Hernández, Jose Manuel; León-Ávila, Gloria; Rodríguez-Cuevas, Sergio; Guisa-Hohenstein, Fernando; Luna-Arias, Juan Pedro
2015-01-01
Breast cancer is the most common and the leading cause of mortality in women worldwide. There is a dire necessity of the identification of novel molecules useful in diagnosis and prognosis. In this work we determined the differentially expression profiles of four breast cancer cell lines compared to a control cell line. We identified 1020 polypeptides labelled with iTRAQ with more than 95% in confidence. We analysed the common proteins in all breast cancer cell lines through IPA software (IPA core and Biomarkers). In addition, we selected the specific overexpressed and subexpressed proteins of the different molecular classes of breast cancer cell lines, and classified them according to protein class and biological process. Data in this article is related to the research article “Determination of the protein expression profiles of breast cancer cell lines by Quantitative Proteomics using iTRAQ Labelling and Tandem Mass Spectrometry” (Calderón-González et al. [1] in press). PMID:26217805
Bioinformatics Pertinent to Lipid Analysis in Biological Samples.
Ma, Justin; Arbelo, Ulises; Guerra, Yenifer; Aribindi, Katyayini; Bhattacharya, Sanjoy K; Pelaez, Daniel
2017-01-01
Electrospray ionization mass spectrometry has revolutionized the way lipids are studied. In this work, we present a tutorial for analyzing class-specific lipid spectra obtained from a triple quadrupole mass spectrometer. The open-source software MZmine 2.21 is used, coupled with LIPID MAPS databases. Here, we describe the steps for lipid identification, ratiometric quantification, and briefly address the differences to the analyses when using direct infusion versus tandem liquid chromatography-mass spectrometry (LC-MS). We also provide a tutorial and equations for quantification of lipid amounts using synthetic lipid standards and normalization to a protein amount.
Electrophoresis gel image processing and analysis using the KODAK 1D software.
Pizzonia, J
2001-06-01
The present article reports on the performance of the KODAK 1D Image Analysis Software for the acquisition of information from electrophoresis experiments and highlights the utility of several mathematical functions for subsequent image processing, analysis, and presentation. Digital images of Coomassie-stained polyacrylamide protein gels containing molecular weight standards and ethidium bromide stained agarose gels containing DNA mass standards are acquired using the KODAK Electrophoresis Documentation and Analysis System 290 (EDAS 290). The KODAK 1D software is used to optimize lane and band identification using features such as isomolecular weight lines. Mathematical functions for mass standard representation are presented, and two methods for estimation of unknown band mass are compared. Given the progressive transition of electrophoresis data acquisition and daily reporting in peer-reviewed journals to digital formats ranging from 8-bit systems such as EDAS 290 to more expensive 16-bit systems, the utility of algorithms such as Gaussian modeling, which can correct geometric aberrations such as clipping due to signal saturation common at lower bit depth levels, is discussed. Finally, image-processing tools that can facilitate image preparation for presentation are demonstrated.
Comparative Proteomic Analysis of Yak Follicular Fluid during Estrus
Guo, Xian; Pei, Jie; Ding, Xuezhi; Chu, Min; Bao, Pengjia; Wu, Xiaoyun; Liang, Chunnian; Yan, Ping
2016-01-01
The breeding of yaks is highly seasonal, there are many crucial proteins involved in the reproduction control program, especially in follicular development. In order to isolate differential proteins between mature and immature follicular fluid (FF) of yak, the FF from yak follicles with different sizes were sampled respectively, and two-dimensional gel electrophoresis (2-DE) of the proteins was carried out. After silver staining, the Image Master 2D platinum software was used for protein analysis and matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF-MS) was performed for differential protein identification. The expression level of transferrin and enolase superfamily member 1 (ENOSF1) was determined by Western blotting for verification analysis. The results showed that 2-DE obtained an electrophoresis map of proteins from mature and immature yak FF with high resolution and repeatability. A comparison of protein profiles identified 12 differently expressed proteins, out of which 10 of them were upregulated while 2 were downregulated. Western blotting showed that the expression of transferrin and ENOSF1 was enhanced with follicular development. Both the obtained protein profiles and the differently expressed proteins identified in this study provided experimental data related to follicular development during yak breeding seasons. This study also laid the foundation for understanding the microenvironment during oocyte development. PMID:26954118
Surinova, Silvia; Hüttenhain, Ruth; Chang, Ching-Yun; Espona, Lucia; Vitek, Olga; Aebersold, Ruedi
2013-08-01
Targeted proteomics based on selected reaction monitoring (SRM) mass spectrometry is commonly used for accurate and reproducible quantification of protein analytes in complex biological mixtures. Strictly hypothesis-driven, SRM assays quantify each targeted protein by collecting measurements on its peptide fragment ions, called transitions. To achieve sensitive and accurate quantitative results, experimental design and data analysis must consistently account for the variability of the quantified transitions. This consistency is especially important in large experiments, which increasingly require profiling up to hundreds of proteins over hundreds of samples. Here we describe a robust and automated workflow for the analysis of large quantitative SRM data sets that integrates data processing, statistical protein identification and quantification, and dissemination of the results. The integrated workflow combines three software tools: mProphet for peptide identification via probabilistic scoring; SRMstats for protein significance analysis with linear mixed-effect models; and PASSEL, a public repository for storage, retrieval and query of SRM data. The input requirements for the protocol are files with SRM traces in mzXML format, and a file with a list of transitions in a text tab-separated format. The protocol is especially suited for data with heavy isotope-labeled peptide internal standards. We demonstrate the protocol on a clinical data set in which the abundances of 35 biomarker candidates were profiled in 83 blood plasma samples of subjects with ovarian cancer or benign ovarian tumors. The time frame to realize the protocol is 1-2 weeks, depending on the number of replicates used in the experiment.
Nakano, Shogo; Asano, Yasuhisa
2015-02-03
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
NASA Astrophysics Data System (ADS)
Nakano, Shogo; Asano, Yasuhisa
2015-02-01
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Ying; Zhao, Rui; Piehowski, Paul D.
One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-µm-i.d. columns increase signal intensity by >3-fold relative to those using 75-µm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos mass spectrometer significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap), leading tomore » a ~3× increase in peptide identifications and 1.7× increase in identified protein groups for 2 ng tryptic digests of bacterial lysate. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~ 95% for 0.5 ng samples and by ~42% for 2 ng samples. The present platform is capable of identifying >3000 protein groups from tryptic digestion of cell lysates equivalent to 50 HeLa cells and 100 THP-1 cells (~10 ng total proteins), respectively, and >950 proteins from subnanogram bacterial and archaeal cell lysates. The present ultrasensitive LC-MS platform is expected to enable deep proteome coverage for subnanogram samples, including single mammalian cells.« less
Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G
2007-07-27
The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins.
Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G
2007-01-01
Background The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. Results A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. Conclusion This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins. PMID:17662140
David, Matthieu; Fertin, Guillaume; Rogniaux, Hélène; Tessier, Dominique
2017-08-04
The analysis of discovery proteomics experiments relies on algorithms that identify peptides from their tandem mass spectra. The almost exhaustive interpretation of these spectra remains an unresolved issue. At present, an important number of missing interpretations is probably due to peptides displaying post-translational modifications and variants that yield spectra that are particularly difficult to interpret. However, the emergence of a new generation of mass spectrometers that provide high fragment ion accuracy has paved the way for more efficient algorithms. We present a new software, SpecOMS, that can handle the computational complexity of pairwise comparisons of spectra in the context of large volumes. SpecOMS can compare a whole set of experimental spectra generated by a discovery proteomics experiment to a whole set of theoretical spectra deduced from a protein database in a few minutes on a standard workstation. SpecOMS can ingeniously exploit those capabilities to improve the peptide identification process, allowing strong competition between all possible peptides for spectrum interpretation. Remarkably, this software resolves the drawbacks (i.e., efficiency problems and decreased sensitivity) that usually accompany open modification searches. We highlight this promising approach using results obtained from the analysis of a public human data set downloaded from the PRIDE (PRoteomics IDEntification) database.
BEAUTY-X: enhanced BLAST searches for DNA queries.
Worley, K C; Culpepper, P; Wiese, B A; Smith, R F
1998-01-01
BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information. BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/). Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html (kworley,culpep)@bcm.tmc.edu
Frequency Domain Identification Toolbox
NASA Technical Reports Server (NTRS)
Horta, Lucas G.; Juang, Jer-Nan; Chen, Chung-Wen
1996-01-01
This report documents software written in MATLAB programming language for performing identification of systems from frequency response functions. MATLAB is a commercial software environment which allows easy manipulation of data matrices and provides other intrinsic matrix functions capabilities. Algorithms programmed in this collection of subroutines have been documented elsewhere but all references are provided in this document. A main feature of this software is the use of matrix fraction descriptions and system realization theory to identify state space models directly from test data. All subroutines have templates for the user to use as guidelines.
Technical advances in proteomics: new developments in data-independent acquisition.
Hu, Alex; Noble, William S; Wolf-Yadlin, Alejandro
2016-01-01
The ultimate aim of proteomics is to fully identify and quantify the entire complement of proteins and post-translational modifications in biological samples of interest. For the last 15 years, liquid chromatography-tandem mass spectrometry (LC-MS/MS) in data-dependent acquisition (DDA) mode has been the standard for proteomics when sampling breadth and discovery were the main objectives; multiple reaction monitoring (MRM) LC-MS/MS has been the standard for targeted proteomics when precise quantification, reproducibility, and validation were the main objectives. Recently, improvements in mass spectrometer design and bioinformatics algorithms have resulted in the rediscovery and development of another sampling method: data-independent acquisition (DIA). DIA comprehensively and repeatedly samples every peptide in a protein digest, producing a complex set of mass spectra that is difficult to interpret without external spectral libraries. Currently, DIA approaches the identification breadth of DDA while achieving the reproducible quantification characteristic of MRM or its newest version, parallel reaction monitoring (PRM). In comparative de novo identification and quantification studies in human cell lysates, DIA identified up to 89% of the proteins detected in a comparable DDA experiment while providing reproducible quantification of over 85% of them. DIA analysis aided by spectral libraries derived from prior DIA experiments or auxiliary DDA data produces identification and quantification as reproducible and precise as that achieved by MRM/PRM, except on low‑abundance peptides that are obscured by stronger signals. DIA is still a work in progress toward the goal of sensitive, reproducible, and precise quantification without external spectral libraries. New software tools applied to DIA analysis have to deal with deconvolution of complex spectra as well as proper filtering of false positives and false negatives. However, the future outlook is positive, and various researchers are working on novel bioinformatics techniques to address these issues and increase the reproducibility, fidelity, and identification breadth of DIA.
Practical analysis of specificity-determining residues in protein families.
Chagoyen, Mónica; García-Martín, Juan A; Pazos, Florencio
2016-03-01
Determining the residues that are important for the molecular activity of a protein is a topic of broad interest in biomedicine and biotechnology. This knowledge can help understanding the protein's molecular mechanism as well as to fine-tune its natural function eventually with biotechnological or therapeutic implications. Some of the protein residues are essential for the function common to all members of a family of proteins, while others explain the particular specificities of certain subfamilies (like binding on different substrates or cofactors and distinct binding affinities). Owing to the difficulty in experimentally determining them, a number of computational methods were developed to detect these functional residues, generally known as 'specificity-determining positions' (or SDPs), from a collection of homologous protein sequences. These methods are mature enough for being routinely used by molecular biologists in directing experiments aimed at getting insight into the functional specificity of a family of proteins and eventually modifying it. In this review, we summarize some of the recent discoveries achieved through SDP computational identification in a number of relevant protein families, as well as the main approaches and software tools available to perform this type of analysis. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Interrogation of an autofluorescence-based method for protein fingerprinting.
Siddaramaiah, Manjunath; Rao, Bola Sadashiva S; Joshi, Manjunath B; Datta, Anirbit; Sandya, S; Vishnumurthy, Vasudha; Chandra, Subhash; Nayak, Subramanya G; Satyamoorthy, Kapaettu; Mahato, Krishna K
2018-03-14
In the present study, we have designed a laser-induced fluorescence (LIF) based instrumentation and developed a sensitive methodology for the effective separation, visualization, identification and analysis of proteins on a single platform. In this method, intrinsic fluorescence spectra of proteins were detected after separation on 1 or 2 dimensional Sodium Dodecyl Sulfate-Tris(2-carboxyethyl)phosphine (SDS-TCEP) polyacrylamide gel electrophoresis (PAGE) and the data were analyzed. The MATLAB assisted software was designed for the development of PAGE fingerprint for the visualization of protein after 1- and 2-dimensional protein separation. These provided objective parameters of intrinsic fluorescence intensity, emission peak, molecular weight and isoelectric point using a single platform. Further, the current architecture could differentiate the overlapping proteins in the PAGE gels which otherwise were not identifiable by conventional staining, imaging and tagging methods. Categorization of the proteins based on the presence or absence of tyrosine or tryptophan residues and assigning the corresponding emission peaks (309-356 nm) with pseudo colors allowed the detection of proportion of proteins within the given spectrum. The present methodology doesn't use stains or tags, hence amenable to couple with mass spectroscopic measurements. This technique may have relevance in the field of proteomics that is used for innumerable applications. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
Lehmann, Roland; Schmidt, André; Pastuschek, Jana; Müller, Mario M; Fritzsche, Andreas; Dieterle, Stefan; Greb, Robert R; Markert, Udo R; Slevogt, Hortense
2018-06-25
The proteomic analysis of complex body fluids by liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis requires the selection of suitable sample preparation techniques and optimal parameter settings in data analysis software packages to obtain reliable results. Proteomic analysis of follicular fluid, as a representative of a complex body fluid similar to serum or plasma, is difficult as it contains a vast amount of high abundant proteins and a variety of proteins with different concentrations. However, the accessibility of this complex body fluid for LC-MS/MS analysis is an opportunity to gain insights into the status, the composition of fertility-relevant proteins including immunological factors or for the discovery of new diagnostic and prognostic markers for, for example, the treatment of infertility. In this study, we compared different sample preparation methods (FASP, eFASP and in-solution digestion) and three different data analysis software packages (Proteome Discoverer with SEQUEST, Mascot and MaxQuant with Andromeda) combined with semi- and full-tryptic databank search options to obtain a maximum coverage of the follicular fluid proteome. We found that the most comprehensive proteome coverage is achieved by the eFASP sample preparation method using SDS in the initial denaturing step and the SEQUEST-based semi-tryptic data analysis. In conclusion, we have developed a fractionation-free methodical workflow for in depth LC-MS/MS-based analysis for the standardized investigation of human follicle fluid as an important representative of a complex body fluid. Taken together, we were able to identify a total of 1392 proteins in follicular fluid. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
CAVER 3.0: A Tool for the Analysis of Transport Pathways in Dynamic Protein Structures
Strnad, Ondrej; Brezovsky, Jan; Kozlikova, Barbora; Gora, Artur; Sustr, Vilem; Klvana, Martin; Medek, Petr; Biedermannova, Lada; Sochor, Jiri; Damborsky, Jiri
2012-01-01
Tunnels and channels facilitate the transport of small molecules, ions and water solvent in a large variety of proteins. Characteristics of individual transport pathways, including their geometry, physico-chemical properties and dynamics are instrumental for understanding of structure-function relationships of these proteins, for the design of new inhibitors and construction of improved biocatalysts. CAVER is a software tool widely used for the identification and characterization of transport pathways in static macromolecular structures. Herein we present a new version of CAVER enabling automatic analysis of tunnels and channels in large ensembles of protein conformations. CAVER 3.0 implements new algorithms for the calculation and clustering of pathways. A trajectory from a molecular dynamics simulation serves as the typical input, while detailed characteristics and summary statistics of the time evolution of individual pathways are provided in the outputs. To illustrate the capabilities of CAVER 3.0, the tool was applied for the analysis of molecular dynamics simulation of the microbial enzyme haloalkane dehalogenase DhaA. CAVER 3.0 safely identified and reliably estimated the importance of all previously published DhaA tunnels, including the tunnels closed in DhaA crystal structures. Obtained results clearly demonstrate that analysis of molecular dynamics simulation is essential for the estimation of pathway characteristics and elucidation of the structural basis of the tunnel gating. CAVER 3.0 paves the way for the study of important biochemical phenomena in the area of molecular transport, molecular recognition and enzymatic catalysis. The software is freely available as a multiplatform command-line application at http://www.caver.cz. PMID:23093919
CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures.
Chovancova, Eva; Pavelka, Antonin; Benes, Petr; Strnad, Ondrej; Brezovsky, Jan; Kozlikova, Barbora; Gora, Artur; Sustr, Vilem; Klvana, Martin; Medek, Petr; Biedermannova, Lada; Sochor, Jiri; Damborsky, Jiri
2012-01-01
Tunnels and channels facilitate the transport of small molecules, ions and water solvent in a large variety of proteins. Characteristics of individual transport pathways, including their geometry, physico-chemical properties and dynamics are instrumental for understanding of structure-function relationships of these proteins, for the design of new inhibitors and construction of improved biocatalysts. CAVER is a software tool widely used for the identification and characterization of transport pathways in static macromolecular structures. Herein we present a new version of CAVER enabling automatic analysis of tunnels and channels in large ensembles of protein conformations. CAVER 3.0 implements new algorithms for the calculation and clustering of pathways. A trajectory from a molecular dynamics simulation serves as the typical input, while detailed characteristics and summary statistics of the time evolution of individual pathways are provided in the outputs. To illustrate the capabilities of CAVER 3.0, the tool was applied for the analysis of molecular dynamics simulation of the microbial enzyme haloalkane dehalogenase DhaA. CAVER 3.0 safely identified and reliably estimated the importance of all previously published DhaA tunnels, including the tunnels closed in DhaA crystal structures. Obtained results clearly demonstrate that analysis of molecular dynamics simulation is essential for the estimation of pathway characteristics and elucidation of the structural basis of the tunnel gating. CAVER 3.0 paves the way for the study of important biochemical phenomena in the area of molecular transport, molecular recognition and enzymatic catalysis. The software is freely available as a multiplatform command-line application at http://www.caver.cz.
The EIPeptiDi tool: enhancing peptide discovery in ICAT-based LC MS/MS experiments.
Cannataro, Mario; Cuda, Giovanni; Gaspari, Marco; Greco, Sergio; Tradigo, Giuseppe; Veltri, Pierangelo
2007-07-15
Isotope-coded affinity tags (ICAT) is a method for quantitative proteomics based on differential isotopic labeling, sample digestion and mass spectrometry (MS). The method allows the identification and relative quantification of proteins present in two samples and consists of the following phases. First, cysteine residues are either labeled using the ICAT Light or ICAT Heavy reagent (having identical chemical properties but different masses). Then, after whole sample digestion, the labeled peptides are captured selectively using the biotin tag contained in both ICAT reagents. Finally, the simplified peptide mixture is analyzed by nanoscale liquid chromatography-tandem mass spectrometry (LC-MS/MS). Nevertheless, the ICAT LC-MS/MS method still suffers from insufficient sample-to-sample reproducibility on peptide identification. In particular, the number and the type of peptides identified in different experiments can vary considerably and, thus, the statistical (comparative) analysis of sample sets is very challenging. Low information overlap at the peptide and, consequently, at the protein level, is very detrimental in situations where the number of samples to be analyzed is high. We designed a method for improving the data processing and peptide identification in sample sets subjected to ICAT labeling and LC-MS/MS analysis, based on cross validating MS/MS results. Such a method has been implemented in a tool, called EIPeptiDi, which boosts the ICAT data analysis software improving peptide identification throughout the input data set. Heavy/Light (H/L) pairs quantified but not identified by the MS/MS routine, are assigned to peptide sequences identified in other samples, by using similarity criteria based on chromatographic retention time and Heavy/Light mass attributes. EIPeptiDi significantly improves the number of identified peptides per sample, proving that the proposed method has a considerable impact on the protein identification process and, consequently, on the amount of potentially critical information in clinical studies. The EIPeptiDi tool is available at http://bioingegneria.unicz.it/~veltri/projects/eipeptidi/ with a demo data set. EIPeptiDi significantly increases the number of peptides identified and quantified in analyzed samples, thus reducing the number of unassigned H/L pairs and allowing a better comparative analysis of sample data sets.
ODEion--a software module for structural identification of ordinary differential equations.
Gennemark, Peter; Wedelin, Dag
2014-02-01
In the systems biology field, algorithms for structural identification of ordinary differential equations (ODEs) have mainly focused on fixed model spaces like S-systems and/or on methods that require sufficiently good data so that derivatives can be accurately estimated. There is therefore a lack of methods and software that can handle more general models and realistic data. We present ODEion, a software module for structural identification of ODEs. Main characteristic features of the software are: • The model space is defined by arbitrary user-defined functions that can be nonlinear in both variables and parameters, such as for example chemical rate reactions. • ODEion implements computationally efficient algorithms that have been shown to efficiently handle sparse and noisy data. It can run a range of realistic problems that previously required a supercomputer. • ODEion is easy to use and provides SBML output. We describe the mathematical problem, the ODEion system itself, and provide several examples of how the system can be used. Available at: http://www.odeidentification.org.
USDA-ARS?s Scientific Manuscript database
MOCASSIN-prot is a software, implemented in Perl and Matlab, for constructing protein similarity networks to classify proteins. Both domain composition and quantitative sequence similarity information are utilized in constructing the directed protein similarity networks. For each reference protein i...
Does filler database size influence identification accuracy?
Bergold, Amanda N; Heaton, Paul
2018-06-01
Police departments increasingly use large photo databases to select lineup fillers using facial recognition software, but this technological shift's implications have been largely unexplored in eyewitness research. Database use, particularly if coupled with facial matching software, could enable lineup constructors to increase filler-suspect similarity and thus enhance eyewitness accuracy (Fitzgerald, Oriet, Price, & Charman, 2013). However, with a large pool of potential fillers, such technologies might theoretically produce lineup fillers too similar to the suspect (Fitzgerald, Oriet, & Price, 2015; Luus & Wells, 1991; Wells, Rydell, & Seelau, 1993). This research proposes a new factor-filler database size-as a lineup feature affecting eyewitness accuracy. In a facial recognition experiment, we select lineup fillers in a legally realistic manner using facial matching software applied to filler databases of 5,000, 25,000, and 125,000 photos, and find that larger databases are associated with a higher objective similarity rating between suspects and fillers and lower overall identification accuracy. In target present lineups, witnesses viewing lineups created from the larger databases were less likely to make correct identifications and more likely to select known innocent fillers. When the target was absent, database size was associated with a lower rate of correct rejections and a higher rate of filler identifications. Higher algorithmic similarity ratings were also associated with decreases in eyewitness identification accuracy. The results suggest that using facial matching software to select fillers from large photograph databases may reduce identification accuracy, and provides support for filler database size as a meaningful system variable. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
1976-01-01
The engineering analyses and evaluation studies conducted for the Software Requirements Analysis are discussed. Included are the development of the study data base, synthesis of implementation approaches for software required by both mandatory onboard computer services and command/control functions, and identification and implementation of software for ground processing activities.
Tabaqchali, S; Silman, R; Holland, D
1987-01-01
A new rapid automated method for the identification and classification of microorganisms is described. It is based on the incorporation of 35S-methionine into cellular proteins and subsequent separation of the radiolabelled proteins by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). The protein patterns produced were species specific and reproducible, permitting discrimination between the species. A large number of Gram negative and Gram positive aerobic and anaerobic organisms were successfully tested. Furthermore, there were sufficient differences within species between the protein profiles to permit subdivision of the species. New typing schemes for Clostridium difficile, coagulase negative staphylococci, and Staphylococcus aureus, including the methicillin resistant strains, could thus be introduced; this has provided the basis for useful epidemiological studies. To standardise and automate the procedure an automated electrophoresis system and a two dimensional scanner were developed to scan the dried gels directly. The scanner is operated by a computer which also stores and analyses the scan data. Specific histograms are produced for each bacterial species. Pattern recognition software is used to construct databases and to compare data obtained from different gels: in this way duplicate "unknowns" can be identified. Specific small areas showing differences between various histograms can also be isolated and expanded to maximise the differences, thus providing differentiation between closely related bacterial species and the identification of differences within the species to provide new typing schemes. This system should be widely applied in clinical microbiology laboratories in the near future. Images Fig 1 Fig 2 Fig 3 Fig 4 Fig 5 Fig 6 Fig 7 Fig 8 PMID:3312300
Pandey, Bharati; Gupta, Om Prakash; Pandey, Dev Mani; Sharma, Indu; Sharma, Pradeep
2013-05-01
MicroRNAs (miRNAs) are a class of short endogenous non-coding small RNA molecules of about 18-22 nucleotides in length. Their main function is to downregulate gene expression in different manners like translational repression, mRNA cleavage and epigenetic modification. Computational predictions have raised the number of miRNAs in wheat significantly using an EST based approach. Hence, a combinatorial approach which is amalgamation of bioinformatics software and perl script was used to identify new miRNA to add to the growing database of wheat miRNA. Identification of miRNAs was initiated by mining the EST (Expressed Sequence Tags) database available at National Center for Biotechnology Information. In this investigation, 4677 mature microRNA sequences belonging to 50 miRNA families from different plant species were used to predict miRNA in wheat. A total of five abiotic stress-responsive new miRNAs were predicted and named Ta-miR5653, Ta-miR855, Ta-miR819k, Ta-miR3708 and Ta-miR5156. In addition, four previously identified miRNA, i.e., Ta-miR1122, miR1117, Ta-miR1134 and Ta-miR1133 were predicted in newly identified EST sequence and 14 potential target genes were subsequently predicted, most of which seems to encode ubiquitin carrier protein, serine/threonine protein kinase, 40S ribosomal protein, F-box/kelch-repeat protein, BTB/POZ domain-containing protein, transcription factors which are involved in growth, development, metabolism and stress response. Our result has increased the number of miRNAs in wheat, which should be useful for further investigation into the biological functions and evolution of miRNAs in wheat and other plant species.
Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko
2013-07-01
Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba
2014-01-03
The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).
Clark, Andrew E; Kaleta, Erin J; Arora, Amit; Wolk, Donna M
2013-07-01
Within the past decade, clinical microbiology laboratories experienced revolutionary changes in the way in which microorganisms are identified, moving away from slow, traditional microbial identification algorithms toward rapid molecular methods and mass spectrometry (MS). Historically, MS was clinically utilized as a high-complexity method adapted for protein-centered analysis of samples in chemistry and hematology laboratories. Today, matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) MS is adapted for use in microbiology laboratories, where it serves as a paradigm-shifting, rapid, and robust method for accurate microbial identification. Multiple instrument platforms, marketed by well-established manufacturers, are beginning to displace automated phenotypic identification instruments and in some cases genetic sequence-based identification practices. This review summarizes the current position of MALDI-TOF MS in clinical research and in diagnostic clinical microbiology laboratories and serves as a primer to examine the "nuts and bolts" of MALDI-TOF MS, highlighting research associated with sample preparation, spectral analysis, and accuracy. Currently available MALDI-TOF MS hardware and software platforms that support the use of MALDI-TOF with direct and precultured specimens and integration of the technology into the laboratory workflow are also discussed. Finally, this review closes with a prospective view of the future of MALDI-TOF MS in the clinical microbiology laboratory to accelerate diagnosis and microbial identification to improve patient care.
Clark, Andrew E.; Kaleta, Erin J.; Arora, Amit
2013-01-01
SUMMARY Within the past decade, clinical microbiology laboratories experienced revolutionary changes in the way in which microorganisms are identified, moving away from slow, traditional microbial identification algorithms toward rapid molecular methods and mass spectrometry (MS). Historically, MS was clinically utilized as a high-complexity method adapted for protein-centered analysis of samples in chemistry and hematology laboratories. Today, matrix-assisted laser desorption ionization–time of flight (MALDI-TOF) MS is adapted for use in microbiology laboratories, where it serves as a paradigm-shifting, rapid, and robust method for accurate microbial identification. Multiple instrument platforms, marketed by well-established manufacturers, are beginning to displace automated phenotypic identification instruments and in some cases genetic sequence-based identification practices. This review summarizes the current position of MALDI-TOF MS in clinical research and in diagnostic clinical microbiology laboratories and serves as a primer to examine the “nuts and bolts” of MALDI-TOF MS, highlighting research associated with sample preparation, spectral analysis, and accuracy. Currently available MALDI-TOF MS hardware and software platforms that support the use of MALDI-TOF with direct and precultured specimens and integration of the technology into the laboratory workflow are also discussed. Finally, this review closes with a prospective view of the future of MALDI-TOF MS in the clinical microbiology laboratory to accelerate diagnosis and microbial identification to improve patient care. PMID:23824373
Wan, Cuihong; Liu, Jian; Fong, Vincent; Lugowski, Andrew; Stoilova, Snejana; Bethune-Waddell, Dylan; Borgeson, Blake; Havugimana, Pierre C; Marcotte, Edward M; Emili, Andrew
2013-04-09
The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2012 Elsevier B.V. All rights reserved.
Identification of BAG3 target proteins in anaplastic thyroid cancer cells by proteomic analysis.
Galdiero, Francesca; Bello, Anna Maria; Spina, Anna; Capiluongo, Anna; Liuu, Sophie; De Marco, Margot; Rosati, Alessandra; Capunzo, Mario; Napolitano, Maria; Vuttariello, Emilia; Monaco, Mario; Califano, Daniela; Turco, Maria Caterina; Chiappetta, Gennaro; Vinh, Joëlle; Chiappetta, Giovanni
2018-01-30
BAG3 protein is an apoptosis inhibitor and is highly expressed in Anaplastic Thyroid Cancer. We investigated the entire set of proteins modulated by BAG3 silencing in the human anaplastic thyroid 8505C cancer cells by using the Stable-Isotope Labeling by Amino acids in Cell culture strategy combined with mass spectrometry analysis. By this approach we identified 37 up-regulated and 54 down-regulated proteins in BAG3-silenced cells. Many of these proteins are reportedly involved in tumor progression, invasiveness and resistance to therapies. We focused our attention on an oncogenic protein, CAV1, and a tumor suppressor protein, SERPINB2, that had not previously been reported to be modulated by BAG3. Their expression levels in BAG3-silenced cells were confirmed by qRT-PCR and western blot analyses, disclosing two novel targets of BAG3 pro-tumor activity. We also examined the dataset of proteins obtained by the quantitative proteomics analysis using two tools, Downstream Effect Analysis and Upstream Regulator Analysis of the Ingenuity Pathways Analysis software. Our analyses confirm the association of the proteome profile observed in BAG3-silenced cells with an increase in cell survival and a decrease in cell proliferation and invasion, and highlight the possible involvement of four tumor suppressor miRNAs and TP53/63 proteins in BAG3 activity.
Di, Guilan; You, Weiwei; Yu, Jinjin; Wang, Dexiang; Ke, Caihuan
2013-03-01
Protein expression patterns were compared in a Japan and Taiwan population of Haliotis diversicolor and in a hybrid between them using 2DE and MALDI-TOF-TOF analyses. Using the software PDQuest, 924 ± 7 protein spots were detected in the Japan population (RR), 861 ± 11 in the Taiwan population (TT), and 882 ± 9 in the F1 hybrid (TR). RR and TR were clustered together, but the distance between RR and TT was the maximum using hierarchical cluster analysis. A total of 46 gel spots were identified and a total of 15 spots matched with abalone proteins (a 33.6% identification rate). Hybrid exhibiting additivity or overdominance accounted for 73.9% of these 46 identified proteins. The 46 differentially expressed proteins were shown to be involved in major biological processes, including muscle contraction and regulation, energy metabolism, and stress response. The proteins involved in energy metabolism included adenosine triphosphate (ATP) synthase β subunit, fructose 1, 6-bisphosphate aldolase, triosephosphate isomerase, enolase, arginine kinase, and tauropine dehydrogenase. These proteins exhibited additivity in their offspring. The proteins involved in stress responses included HSP Hsp70 (exhibiting overdominance in the offspring) and Cu/Zn-superoxide dismutase (exhibiting additivity). These results suggested that proteomic approach is suitable for analysis of heterosis and functional prediction of abalone hybridization. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
MaxReport: An Enhanced Proteomic Result Reporting Tool for MaxQuant.
Zhou, Tao; Li, Chuyu; Zhao, Wene; Wang, Xinru; Wang, Fuqiang; Sha, Jiahao
2016-01-01
MaxQuant is a proteomic software widely used for large-scale tandem mass spectrometry data. We have designed and developed an enhanced result reporting tool for MaxQuant, named as MaxReport. This tool can optimize the results of MaxQuant and provide additional functions for result interpretation. MaxReport can generate report tables for protein N-terminal modifications. It also supports isobaric labelling based relative quantification at the protein, peptide or site level. To obtain an overview of the results, MaxReport performs general descriptive statistical analyses for both identification and quantification results. The output results of MaxReport are well organized and therefore helpful for proteomic users to better understand and share their data. The script of MaxReport, which is freely available at http://websdoor.net/bioinfo/maxreport/, is developed using Python code and is compatible across multiple systems including Windows and Linux.
Karger, Axel; Stock, Rüdiger; Ziller, Mario; Elschner, Mandy C; Bettin, Barbara; Melzer, Falk; Maier, Thomas; Kostrzewa, Markus; Scholz, Holger C; Neubauer, Heinrich; Tomaso, Herbert
2012-10-10
Burkholderia (B.) pseudomallei and B. mallei are genetically closely related species. B. pseudomallei causes melioidosis in humans and animals, whereas B. mallei is the causative agent of glanders in equines and rarely also in humans. Both agents have been classified by the CDC as priority category B biological agents. Rapid identification is crucial, because both agents are intrinsically resistant to many antibiotics. Matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-TOF MS) has the potential of rapid and reliable identification of pathogens, but is limited by the availability of a database containing validated reference spectra. The aim of this study was to evaluate the use of MALDI-TOF MS for the rapid and reliable identification and differentiation of B. pseudomallei and B. mallei and to build up a reliable reference database for both organisms. A collection of ten B. pseudomallei and seventeen B. mallei strains was used to generate a library of reference spectra. Samples of both species could be identified by MALDI-TOF MS, if a dedicated subset of the reference spectra library was used. In comparison with samples representing B. mallei, higher genetic diversity among B. pseudomallei was reflected in the higher average Eucledian distances between the mass spectra and a broader range of identification score values obtained with commercial software for the identification of microorganisms. The type strain of B. pseudomallei (ATCC 23343) was isolated decades ago and is outstanding in the spectrum-based dendrograms probably due to massive methylations as indicated by two intensive series of mass increments of 14 Da specifically and reproducibly found in the spectra of this strain. Handling of pathogens under BSL 3 conditions is dangerous and cumbersome but can be minimized by inactivation of bacteria with ethanol, subsequent protein extraction under BSL 1 conditions and MALDI-TOF MS analysis being faster than nucleic amplification methods. Our spectra demonstrated a higher homogeneity in B. mallei than in B. pseudomallei isolates. As expected for closely related species, the identification process with MALDI Biotyper software (Bruker Daltonik GmbH, Bremen, Germany) requires the careful selection of spectra from reference strains. When a dedicated reference set is used and spectra of high quality are acquired, it is possible to distinguish both species unambiguously. The need for a careful curation of reference spectra databases is stressed.
2012-01-01
Background Burkholderia (B.) pseudomallei and B. mallei are genetically closely related species. B. pseudomallei causes melioidosis in humans and animals, whereas B. mallei is the causative agent of glanders in equines and rarely also in humans. Both agents have been classified by the CDC as priority category B biological agents. Rapid identification is crucial, because both agents are intrinsically resistant to many antibiotics. Matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-TOF MS) has the potential of rapid and reliable identification of pathogens, but is limited by the availability of a database containing validated reference spectra. The aim of this study was to evaluate the use of MALDI-TOF MS for the rapid and reliable identification and differentiation of B. pseudomallei and B. mallei and to build up a reliable reference database for both organisms. Results A collection of ten B. pseudomallei and seventeen B. mallei strains was used to generate a library of reference spectra. Samples of both species could be identified by MALDI-TOF MS, if a dedicated subset of the reference spectra library was used. In comparison with samples representing B. mallei, higher genetic diversity among B. pseudomallei was reflected in the higher average Eucledian distances between the mass spectra and a broader range of identification score values obtained with commercial software for the identification of microorganisms. The type strain of B. pseudomallei (ATCC 23343) was isolated decades ago and is outstanding in the spectrum-based dendrograms probably due to massive methylations as indicated by two intensive series of mass increments of 14 Da specifically and reproducibly found in the spectra of this strain. Conclusions Handling of pathogens under BSL 3 conditions is dangerous and cumbersome but can be minimized by inactivation of bacteria with ethanol, subsequent protein extraction under BSL 1 conditions and MALDI-TOF MS analysis being faster than nucleic amplification methods. Our spectra demonstrated a higher homogeneity in B. mallei than in B. pseudomallei isolates. As expected for closely related species, the identification process with MALDI Biotyper software (Bruker Daltonik GmbH, Bremen, Germany) requires the careful selection of spectra from reference strains. When a dedicated reference set is used and spectra of high quality are acquired, it is possible to distinguish both species unambiguously. The need for a careful curation of reference spectra databases is stressed. PMID:23046611
Identification and proteomic analysis of a novel gossypol-degrading fungal strain.
Yang, Xia; Sun, Jian-Yi; Guo, Jian-Lin; Weng, Xiao-Yan
2012-03-15
Cottonseed meal, an important source of feed raw materials, has limited use in the feed industry because of the presence of the highly toxic gossypol. The aim of the current work was to isolate the gossypol-degrading fungus from a soil microcosm and investigate the proteins involved in gossypol degradation. A fungal strain, AN-1, that uses gossypol as its sole carbon source was isolated and identified as Aspergillus niger. A large number of intracellular proteins were detected using sodium dodecyl sulfate-polyacrylamide gel electrophoresis, but no significant difference was observed between the glucose-containing and gossypol-containing mycelium extracts. Two-dimensional gel electrophoresis results showed that the protein spots were concentrated in the 25.0-66.2 kDa range and distributed in different pI gradients. PDQuest software showed that 51 protein spots in the gels were differentially expressed. Of these, 20 differential protein spots, including six special spots expressed in gossypol, were analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The fungus AN-1 biodegraded gossypol and the proteomic analysis results indicate that some proteins were involved in the gossypol biodegradation during fungus survival, using gossypol as its sole carbon source. Copyright © 2011 Society of Chemical Industry.
A General Method for Targeted Quantitative Cross-Linking Mass Spectrometry.
Chavez, Juan D; Eng, Jimmy K; Schweppe, Devin K; Cilia, Michelle; Rivera, Keith; Zhong, Xuefei; Wu, Xia; Allen, Terrence; Khurgel, Moshe; Kumar, Akhilesh; Lampropoulos, Athanasios; Larsson, Mårten; Maity, Shuvadeep; Morozov, Yaroslav; Pathmasiri, Wimal; Perez-Neut, Mathew; Pineyro-Ruiz, Coriness; Polina, Elizabeth; Post, Stephanie; Rider, Mark; Tokmina-Roszyk, Dorota; Tyson, Katherine; Vieira Parrine Sant'Ana, Debora; Bruce, James E
2016-01-01
Chemical cross-linking mass spectrometry (XL-MS) provides protein structural information by identifying covalently linked proximal amino acid residues on protein surfaces. The information gained by this technique is complementary to other structural biology methods such as x-ray crystallography, NMR and cryo-electron microscopy[1]. The extension of traditional quantitative proteomics methods with chemical cross-linking can provide information on the structural dynamics of protein structures and protein complexes. The identification and quantitation of cross-linked peptides remains challenging for the general community, requiring specialized expertise ultimately limiting more widespread adoption of the technique. We describe a general method for targeted quantitative mass spectrometric analysis of cross-linked peptide pairs. We report the adaptation of the widely used, open source software package Skyline, for the analysis of quantitative XL-MS data as a means for data analysis and sharing of methods. We demonstrate the utility and robustness of the method with a cross-laboratory study and present data that is supported by and validates previously published data on quantified cross-linked peptide pairs. This advance provides an easy to use resource so that any lab with access to a LC-MS system capable of performing targeted quantitative analysis can quickly and accurately measure dynamic changes in protein structure and protein interactions.
Dubovenko, Alexey; Nikolsky, Yuri; Rakhmatulin, Eugene; Nikolskaya, Tatiana
2017-01-01
Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).
CosmoQuest Transient Tracker: Opensource Photometry & Astrometry software
NASA Astrophysics Data System (ADS)
Myers, Joseph L.; Lehan, Cory; Gay, Pamela; Richardson, Matthew; CosmoQuest Team
2018-01-01
CosmoQuest is moving from online citizen science, to observational astronomy with the creation of Transient Trackers. This open source software is designed to identify asteroids and other transient/variable objects in image sets. Transient Tracker’s features in final form will include: astrometric and photometric solutions, identification of moving/transient objects, identification of variable objects, and lightcurve analysis. In this poster we present our initial, v0.1 release and seek community input.This software builds on the existing NIH funded ImageJ libraries. Creation of this suite of opensource image manipulation routines is lead by Wayne Rasband and is released primarily under the MIT license. In this release, we are building on these libraries to add source identification for point / point-like sources, and to do astrometry. Our materials released under the Apache 2.0 license on github (http://github.com/CosmoQuestTeam) and documentation can be found at http://cosmoquest.org/TransientTracker.
PROGRAM FOR THE IDENTIFICATION AND REPLACEMENT OF ENDOCRINE DISRUPTING CHEMICALS
A computer software program is being developed to aid in the identification and replacement of endocrine disrupting chemicals (EDC). This program will be comprised of two distinct areas of research: identification of potential EDC nd suggstions for replacing those potential EDC. ...
Nema, Vijay; Pal, Sudhir Kumar
2013-01-01
This study was conducted to find the best suited freely available software for modelling of proteins by taking a few sample proteins. The proteins used were small to big in size with available crystal structures for the purpose of benchmarking. Key players like Phyre2, Swiss-Model, CPHmodels-3.0, Homer, (PS)2, (PS)(2)-V(2), Modweb were used for the comparison and model generation. Benchmarking process was done for four proteins, Icl, InhA, and KatG of Mycobacterium tuberculosis and RpoB of Thermus Thermophilus to get the most suited software. Parameters compared during analysis gave relatively better values for Phyre2 and Swiss-Model. This comparative study gave the information that Phyre2 and Swiss-Model make good models of small and large proteins as compared to other screened software. Other software was also good but is often not very efficient in providing full-length and properly folded structure.
Applications of graph theory in protein structure identification
2011-01-01
There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given. PMID:22165974
Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks*
Bandeira, Nuno
2016-01-01
Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software. PMID:27609420
Eigensystem realization algorithm user's guide forVAX/VMS computers: Version 931216
NASA Technical Reports Server (NTRS)
Pappa, Richard S.
1994-01-01
The eigensystem realization algorithm (ERA) is a multiple-input, multiple-output, time domain technique for structural modal identification and minimum-order system realization. Modal identification is the process of calculating structural eigenvalues and eigenvectors (natural vibration frequencies, damping, mode shapes, and modal masses) from experimental data. System realization is the process of constructing state-space dynamic models for modern control design. This user's guide documents VAX/VMS-based FORTRAN software developed by the author since 1984 in conjunction with many applications. It consists of a main ERA program and 66 pre- and post-processors. The software provides complete modal identification capabilities and most system realization capabilities.
USDA-ARS?s Scientific Manuscript database
Matrix-assisted laser desorption/ionization tandem time-of-flight (MALDI-TOF-TOF) mass spectrometry is increasingly utilized for rapid top-down proteomic identification of proteins. This identification may involve analysis of either a pure protein or a protein mixture. For analysis of a pure protein...
LIQUID: an-open source software for identifying lipids in LC-MS/MS-based lipidomics data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kyle, Jennifer E.; Crowell, Kevin L.; Casey, Cameron P.
2017-01-31
We introduce an open-source software, LIQUID, for semi-automated processing and visualization of LC-MS/MS based lipidomics data. LIQUID provides users with the capability to process high throughput data and contains a customizable target library and scoring model per project needs. The graphical user interface provides visualization of multiple lines of spectral evidence for each lipid identification, allowing rapid examination of data for making confident identifications of lipid molecular species.
Oeck, Sebastian; Malewicz, Nathalie M; Hurst, Sebastian; Al-Refae, Klaudia; Krysztofiak, Adam; Jendrossek, Verena
2017-07-01
The quantitative analysis of foci plays an important role in various cell biological methods. In the fields of radiation biology and experimental oncology, the effect of ionizing radiation, chemotherapy or molecularly targeted drugs on DNA damage induction and repair is frequently performed by the analysis of protein clusters or phosphorylated proteins recruited to so called repair foci at DNA damage sites, involving for example γ-H2A.X, 53BP1 or RAD51. We recently developed "The Focinator" as a reliable and fast tool for automated quantitative and qualitative analysis of nuclei and DNA damage foci. The refined software is now even more user-friendly due to a graphical interface and further features. Thus, we included an R-script-based mode for automated image opening, file naming, progress monitoring and an error report. Consequently, the evaluation no longer required the attendance of the operator after initial parameter definition. Moreover, the Focinator v2-0 is now able to perform multi-channel analysis of four channels and evaluation of protein-protein colocalization by comparison of up to three foci channels. This enables for example the quantification of foci in cells of a specific cell cycle phase.
Development of Automated Image Analysis Software for Suspended Marine Particle Classification
2003-09-30
Development of Automated Image Analysis Software for Suspended Marine Particle Classification Scott Samson Center for Ocean Technology...REPORT TYPE 3. DATES COVERED 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE Development of Automated Image Analysis Software for Suspended...objective is to develop automated image analysis software to reduce the effort and time required for manual identification of plankton images. Automated
Martin, Laetitia B. B.; Sherwood, Robert W.; Nicklay, Joshua J.; Yang, Yong; Muratore-Schroeder, Tara L.; Anderson, Elizabeth T.; Thannhauser, Theodore W.; Rose, Jocelyn K. C.; Zhang, Sheng
2017-01-01
We describe here the use of label-free wide selected-ion monitoring data-independent acquisition (WiSIM-DIA) to identify proteins that are involved in the formation of tomato (Solanum lycopersicum) fruit cuticles and that are regulated by the transcription factor CUTIN DEFICIENT2 (CD2). A spectral library consisting of 11 753 unique peptides, corresponding to 2338 tomato protein groups, was used and the DIA analysis was performed at the MS1 level utilizing narrow mass windows for extraction with Skyline 2.6 software. We identified a total of 1140 proteins, 67 of which had expression levels that differed significantly between the cd2 tomato mutant and the wild-type cultivar M82. Differentially expressed proteins including a key protein involved in cutin biosynthesis, were selected for validation by target SRM/MRM and by Western blot analysis. In addition to confirming a role for CD2 in regulating cuticle formation, the results also revealed that CD2 influences pathways associated with cell wall biology, anthocyanin biosynthesis, plant development, and responses to stress, which complements findings of earlier RNA-Seq experiments. Our results provide new insights into molecular processes and aspects of fruit biology associated with CD2 function, and demonstrate that the WiSIM-DIA is an effective quantitative approach for global protein identifications. PMID:27089858
Aiyetan, Paul; Zhang, Bai; Zhang, Zhen; Zhang, Hui
2014-01-01
Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.
Sparbier, Katrin; Asperger, Arndt; Resemann, Anja; Kessler, Irina; Koch, Sonja; Wenzel, Thomas; Stein, Günter; Vorwerg, Lars; Suckau, Detlev; Kostrzewa, Markus
2007-01-01
Comprehensive proteomic analyses require efficient and selective pre-fractionation to facilitate analysis of post-translationally modified peptides and proteins, and automated analysis workflows enabling the detection, identification, and structural characterization of the corresponding peptide modifications. Human serum contains a high number of glycoproteins, comprising several orders of magnitude in concentration. Thereby, isolation and subsequent identification of low-abundant glycoproteins from serum is a challenging task. selective capturing of glycopeptides and -proteins was attained by means of magnetic particles specifically functionalized with lectins or boronic acids that bind to various structural motifs. Human serum was incubated with differentially functionalized magnetic micro-particles (lectins or boronic acids), and isolated proteins were digested with trypsin. Subsequently, the resulting complex mixture of peptides and glycopeptides was subjected to LC-MALDI analysis and database searching. In parallel, a second magnetic bead capturing was performed on the peptide level to separate and analyze by LC-MALDI intact glycopeptides, both peptide sequence and glycan structure. Detection of glycopeptides was achieved by means of a software algorithm that allows extraction and characterization of potential glycopeptide candidates from large LC-MALDI-MS/MS data sets, based on N-glycopeptide-specific fragmentation patterns and characteristic fragment mass peaks, respectively. By means of fast and simple glycospecific capturing applied in conjunction with extensive LC-MALDI-MS/MS analysis and novel data analysis tools, a high number of low-abundant proteins were identified, comprising known or predicted glycosylation sites. According to the specific binding preferences of the different types of beads, complementary results were obtained from the experiments using either magnetic ConA-, LCA-, WGA-, and boronic acid beads, respectively. PMID:17916798
Baldwin, M A; Medzihradszky, K F; Lock, C M; Fisher, B; Settineri, T A; Burlingame, A L
2001-04-15
The design and operation of a novel UV-MALDI ionization source on a commercial QqoaTOF mass spectrometer (Applied Biosystem/MDS Sciex QSTAR Pulsar) is described. Samples are loaded on a 96-well target plate, the movement of which is under software control and can be readily automated. Unlike conventional high-energy MALDI-TOF, the ions are produced with low energies (5-10 eV) in a region of relatively low vacuum (8 mTorr). Thus, they are cooled by extensive low-energy collisions before selection in the quadrupole mass analyzer (Q1), potentially giving a quasi-continuous ion beam ideally suited to the oaTOF used for mass analysis of the fragment ions, although ion yields from individual laser shots may vary widely. Ion dissociation is induced by collisions with argon in an rf-only quadrupole cell, giving typical low-energy CID spectra for protonated peptide ions. Ions separated in the oaTOF are registered by a four-anode detector and time-to-digital converter and accumulated in "bins" that are 625 ps wide. Peak shapes depend upon the number of ion counts in adjacent bins. As expected, the accuracy of mass measurement is shown to be dependent upon the number of ions recorded for a particular peak. With internal calibration, mass accuracy better than 10 ppm is attainable for peaks that contain sufficient ions to give well-defined Gaussian profiles. By virtue of its high resolution, capability for accurate mass measurements, and sensitivity in the low-femotomole range, this instrument is ideally suited to protein identification for proteomic applications by generation of peptide tags, manual sequence interpretation, identification of modifications such as phosphorylation, and protein structural elucidation. Unlike the multiply charged ions typical of electrospray ionization, the singly charged MALDI-generated peptide ions show a linear dependence of optimal collision energy upon molecular mass, which is advantageous for automated operation. It is shown that the novel pulsing technique of this instrument that increases the sensitivity for precursor ions scans is applicable to the identification of peptides labeled with isotope-coded affinity tags.
Design and Pedagogical Issues in the Development of the InSight Series of Instructional Software.
ERIC Educational Resources Information Center
Baro, John A.; Lehmkulke, Stephen
1993-01-01
Design issues in development of InSight software for optometric education include choice of hardware, identification of audience, definition of scope and limitations of content, selection of user interface and programing environment, obtaining user feedback, and software distribution. Pedagogical issues include practicality and improvement on…
An Analysis of Open Source Security Software Products Downloads
ERIC Educational Resources Information Center
Barta, Brian J.
2014-01-01
Despite the continued demand for open source security software, a gap in the identification of success factors related to the success of open source security software persists. There are no studies that accurately assess the extent of this persistent gap, particularly with respect to the strength of the relationships of open source software…
2014-01-01
Introduction Cartilage protein distribution and the changes that occur in cartilage ageing and disease are essential in understanding the process of cartilage ageing and age related diseases such as osteoarthritis. The aim of this study was to investigate the peptide profiles in ageing and osteoarthritic (OA) cartilage sections using matrix assisted laser desorption ionization mass spectrometry imaging (MALDI-MSI). Methods The distribution of proteins in young, old and OA equine cartilage was compared following tryptic digestion of cartilage slices and MALDI-MSI undertaken with a MALDI SYNAPT™ HDMS system. Protein identification was undertaken using database searches following multivariate analysis. Peptide intensity differences between young, ageing and OA cartilage were imaged with Biomap software. Analysis of aggrecanase specific cleavage patterns of a crude cartilage proteoglycan extract were used to validate some of the differences in peptide intensity identified. Immunohistochemistry studies validated the differences in protein abundance. Results Young, old and OA equine cartilage was discriminated based on their peptide signature using discriminant analysis. Proteins including aggrecan core protein, fibromodulin, and cartilage oligomeric matrix protein were identified and localised. Fibronectin peptides displayed a stronger intensity in OA cartilage. Age-specific protein markers for collectin-43 and cartilage oligomeric matrix protein were identified. In addition potential fibromodulin and biglycan peptides targeted for degradation in OA were detected. Conclusions MALDI-MSI provided a novel platform to study cartilage ageing and disease enabling age and disease specific peptides in cartilage to be elucidated and spatially resolved. PMID:24886698
The RING 2.0 web server for high quality residue interaction networks.
Piovesan, Damiano; Minervini, Giovanni; Tosatto, Silvio C E
2016-07-08
Residue interaction networks (RINs) are an alternative way of representing protein structures where nodes are residues and arcs physico-chemical interactions. RINs have been extensively and successfully used for analysing mutation effects, protein folding, domain-domain communication and catalytic activity. Here we present RING 2.0, a new version of the RING software for the identification of covalent and non-covalent bonds in protein structures, including π-π stacking and π-cation interactions. RING 2.0 is extremely fast and generates both intra and inter-chain interactions including solvent and ligand atoms. The generated networks are very accurate and reliable thanks to a complex empirical re-parameterization of distance thresholds performed on the entire Protein Data Bank. By default, RING output is generated with optimal parameters but the web server provides an exhaustive interface to customize the calculation. The network can be visualized directly in the browser or in Cytoscape. Alternatively, the RING-Viz script for Pymol allows visualizing the interactions at atomic level in the structure. The web server and RING-Viz, together with an extensive help and tutorial, are available from URL: http://protein.bio.unipd.it/ring. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Software forecasting as it is really done: A study of JPL software engineers
NASA Technical Reports Server (NTRS)
Griesel, Martha Ann; Hihn, Jairus M.; Bruno, Kristin J.; Fouser, Thomas J.; Tausworthe, Robert C.
1993-01-01
This paper presents a summary of the results to date of a Jet Propulsion Laboratory internally funded research task to study the costing process and parameters used by internally recognized software cost estimating experts. Protocol Analysis and Markov process modeling were used to capture software engineer's forecasting mental models. While there is significant variation between the mental models that were studied, it was nevertheless possible to identify a core set of cost forecasting activities, and it was also found that the mental models cluster around three forecasting techniques. Further partitioning of the mental models revealed clustering of activities, that is very suggestive of a forecasting lifecycle. The different forecasting methods identified were based on the use of multiple-decomposition steps or multiple forecasting steps. The multiple forecasting steps involved either forecasting software size or an additional effort forecast. Virtually no subject used risk reduction steps in combination. The results of the analysis include: the identification of a core set of well defined costing activities, a proposed software forecasting life cycle, and the identification of several basic software forecasting mental models. The paper concludes with a discussion of the implications of the results for current individual and institutional practices.
2017-04-01
notice for non -US Government use and distribution. External use: This material may be reproduced in its entirety, without modification, and freely...Combinatorial Design Methods 4 2.1 Identification of Significant Improvement Opportunity 4 2.2 Methodology Development 4 2.3 Piloting...11 3 Process Performance Modeling and Analysis 13 3.1 Identification of Significant Improvement Opportunity 13 3.2 Methodology Development 13 3.3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tolić, Nikola; Liu, Yina; Liyu, Andrey
Ultrahigh-resolution mass spectrometry, such as Fourier transform ion-cyclotron resonance mass spectrometry (FT-ICR MS), can resolve thousands of molecular ions in complex organic matrices. A Compound Identification Algorithm (CIA) was previously developed for automated elemental formula assignment for natural organic matter (NOM). In this work we describe a user friendly interface for CIA, titled Formularity, which includes an additional functionality to perform search of formulas based on an Isotopic Pattern Algorithm (IPA). While CIA assigns elemental formulas for compounds containing C, H, O, N, S, and P, IPA is capable of assigning formulas for compounds containing other elements. We used halogenatedmore » organic compounds (HOC), a chemical class that is ubiquitous in nature as well as anthropogenic systems, as an example to demonstrate the capability of Formularity with IPA. A HOC standard mix was used to evaluate the identification confidence of IPA. The HOC spike in NOM and tap water were used to assess HOC identification in natural and anthropogenic matrices. Strategies for reconciliation of CIA and IPA assignments are discussed. Software and sample databases with documentation are freely available from the PNNL OMICS software repository https://omics.pnl.gov/software/formularity.« less
Warth, Benedikt; Rajkai, György; Mandenius, Carl-Fredrik
2010-05-03
Software sensors for monitoring and on-line estimation of critical bioprocess variables have mainly been used with standard bioreactor sensors, such as electrodes and gas analyzers, where algorithms in the software model have generated the desired state variables. In this article we propose that other on-line instruments, such as NIR probes and on-line HPLC, should be used to make more reliable and flexible software sensors. Five software sensor architectures were compared and evaluated: (1) biomass concentration from an on-line NIR probe, (2) biomass concentration from titrant addition, (3) specific growth rate from titrant addition, (4) specific growth rate from the NIR probe, and (5) specific substrate uptake rate and by-product rate from on-line HPLC and NIR probe signals. The software sensors were demonstrated on an Escherichia coli cultivation expressing a recombinant protein, green fluorescent protein (GFP), but the results could be extrapolated to other production organisms and product proteins. We conclude that well-maintained on-line instrumentation (hardware sensors) can increase the potential of software sensors. This would also strongly support the intentions with process analytical technology and quality-by-design concepts. 2010 Elsevier B.V. All rights reserved.
2014-01-01
Background Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research. Methods Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing. Results Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. Conclusions The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket. PMID:24401206
Su, Li-Ning; Song, Xiao-Qing; Wei, Hui-Ping; Yin, Hai-Feng
Bone mesenchymal stem cells (BMSCs) differentiated into neurons have been widely proposed for use in cell therapy of many neurological disorders. It is therefore important to understand the molecular mechanisms underlying this differentiation. We screened differentially expressed genes between immature neural tissues and untreated BMSCs to identify the genes responsible for neuronal differentiation from BMSCs. GSE68243 gene microarray data of rat BMSCs and GSE18860 gene microarray data of rat neurons were received from the Gene Expression Omnibus database. Transcriptome Analysis Console software showed that 1248 genes were up-regulated and 1273 were down-regulated in neurons compared with BMSCs. Gene Ontology functional enrichment, protein-protein interaction networks, functional modules, and hub genes were analyzed using DAVID, STRING 10, BiNGO tool, and Network Analyzer software, revealing that nine hub genes, Nrcam, Sema3a, Mapk8, Dlg4, Slit1, Creb1, Ntrk2, Cntn2, and Pax6, may play a pivotal role in neuronal differentiation from BMSCs. Seven genes, Dcx, Nrcam, sema3a, Cntn2, Slit1, Ephb1, and Pax6, were shown to be hub nodes within the neuronal development network, while six genes, Fgf2, Tgfβ1, Vegfa, Serpine1, Il6, and Stat1, appeared to play an important role in suppressing neuronal differentiation. However, additional studies are required to confirm these results.
mzStudio: A Dynamic Digital Canvas for User-Driven Interrogation of Mass Spectrometry Data.
Ficarro, Scott B; Alexander, William M; Marto, Jarrod A
2017-08-01
Although not yet truly 'comprehensive', modern mass spectrometry-based experiments can generate quantitative data for a meaningful fraction of the human proteome. Importantly for large-scale protein expression analysis, robust data pipelines are in place for identification of un-modified peptide sequences and aggregation of these data to protein-level quantification. However, interoperable software tools that enable scientists to computationally explore and document novel hypotheses for peptide sequence, modification status, or fragmentation behavior are not well-developed. Here, we introduce mzStudio, an open-source Python module built on our multiplierz project. This desktop application provides a highly-interactive graphical user interface (GUI) through which scientists can examine and annotate spectral features, re-search existing PSMs to test different modifications or new spectral matching algorithms, share results with colleagues, integrate other domain-specific software tools, and finally create publication-quality graphics. mzStudio leverages our common application programming interface (mzAPI) for access to native data files from multiple instrument platforms, including ion trap, quadrupole time-of-flight, Orbitrap, matrix-assisted laser desorption ionization, and triple quadrupole mass spectrometers and is compatible with several popular search engines including Mascot, Proteome Discoverer, X!Tandem, and Comet. The mzStudio toolkit enables researchers to create a digital provenance of data analytics and other evidence that support specific peptide sequence assignments.
Nema, Vijay; Pal, Sudhir Kumar
2013-01-01
Aim: This study was conducted to find the best suited freely available software for modelling of proteins by taking a few sample proteins. The proteins used were small to big in size with available crystal structures for the purpose of benchmarking. Key players like Phyre2, Swiss-Model, CPHmodels-3.0, Homer, (PS)2, (PS)2-V2, Modweb were used for the comparison and model generation. Results: Benchmarking process was done for four proteins, Icl, InhA, and KatG of Mycobacterium tuberculosis and RpoB of Thermus Thermophilus to get the most suited software. Parameters compared during analysis gave relatively better values for Phyre2 and Swiss-Model. Conclusion: This comparative study gave the information that Phyre2 and Swiss-Model make good models of small and large proteins as compared to other screened software. Other software was also good but is often not very efficient in providing full-length and properly folded structure. PMID:24023424
Wang, Xiupin; Peng, Qingzhi; Li, Peiwu; Zhang, Qi; Ding, Xiaoxia; Zhang, Wen; Zhang, Liangxiao
2016-10-12
High complexity of identification for non-target triacylglycerols (TAGs) is a major challenge in lipidomics analysis. To identify non-target TAGs, a powerful tool named accurate MS(n) spectrometry generating so-called ion trees is used. In this paper, we presented a technique for efficient structural elucidation of TAGs on MS(n) spectral trees produced by LTQ Orbitrap MS(n), which was implemented as an open source software package, or TIT. The TIT software was used to support automatic annotation of non-target TAGs on MS(n) ion trees from a self-built fragment ion database. This database includes 19108 simulate TAG molecules from a random combination of fatty acids and corresponding 500582 self-built multistage fragment ions (MS ≤ 3). Our software can identify TAGs using a "stage-by-stage elimination" strategy. By utilizing the MS(1) accurate mass and referenced RKMD, the TIT software can discriminate unique elemental composition candidates. The regiospecific isomers of fatty acyl chains will be distinguished using MS(2) and MS(3) fragment spectra. We applied the algorithm to the selection of 45 TAG standards and demonstrated that the molecular ions could be 100% correctly assigned. Therefore, the TIT software could be applied to TAG identification in complex biological samples such as mouse plasma extracts. Copyright © 2016 Elsevier B.V. All rights reserved.
Incorporation of unnatural sugars for the identification of glycoproteins.
Zaro, Balyn W; Hang, Howard C; Pratt, Matthew R
2013-01-01
Glycosylation is an abundant post-translational modification that alters the fate and function of its substrate proteins. To aid in understanding the significance of protein glycosylation, identification of target proteins is key. As with all proteomics experiments, mass spectrometry has been established as the desired method for substrate identification. However, these approaches require selective enrichment and purification of modified proteins. Chemical reporters in combination with bioorthogonal reactions have emerged as robust tools for identifying post-translational modifications including glycosylation. We provide here a method for the use of bioorthogonal chemical reporters for isolation and identification of glycosylated proteins. More specifically, this protocol is a representative procedure from our own work using an alkyne-bearing O-GlcNAc chemical reporter (GlcNAlk) and a chemically cleavable azido-azo-biotin probe for the identification of O-GlcNAc-modified proteins.
2014-01-01
Background Canine babesiosis is a tick-borne disease that is caused by the haemoprotozoan parasites of the genus Babesia. There are limited data on serum proteomics in dogs, and none of the effect of babesiosis on the serum proteome. The aim of this study was to identify the potential serum biomarkers of babesiosis using proteomic techniques in order to increase our understanding about disease pathogenesis. Results Serum samples were collected from 25 dogs of various breeds and sex with naturally occurring babesiosis caused by B. canis canis. Blood was collected on the day of admission (day 0), and subsequently on the 1st and 6th day of treatment. Two-dimensional electrophoresis (2DE) of pooled serum samples of dogs with naturally occurring babesiosis (day 0, day 1 and day 6) and healthy dogs were run in triplicate. 2DE image analysis showed 64 differentially expressed spots with p ≤ 0.05 and 49 spots with fold change ≥2. Six selected spots were excised manually and subjected to trypsin digest prior to identification by electrospray ionisation mass spectrometry on an Amazon ion trap tandem mass spectrometry (MS/MS). Mass spectrometry data was processed using Data Analysis software and the automated Matrix Science Mascot Daemon server. Protein identifications were assigned using the Mascot search engine to interrogate protein sequences in the NCBI Genbank database. A number of differentially expressed serum proteins involved in inflammation mediated acute phase response, complement and coagulation cascades, apolipoproteins and vitamin D metabolism pathway were identified in dogs with babesiosis. Conclusions Our findings confirmed two dominant pathogenic mechanisms of babesiosis, haemolysis and acute phase response. These results may provide possible serum biomarker candidates for clinical monitoring of babesiosis and this study could serve as the basis for further proteomic investigations in canine babesiosis. PMID:24885808
Zelesky, Veronica; Schneider, Richard; Janiszewski, John; Zamora, Ismael; Ferguson, James; Troutman, Matthew
2013-05-01
The ability to supplement high-throughput metabolic clearance data with structural information defining the site of metabolism should allow design teams to streamline their synthetic decisions. However, broad application of metabolite identification in early drug discovery has been limited, largely due to the time required for data review and structural assignment. The advent of mass defect filtering and its application toward metabolite scouting paved the way for the development of software automation tools capable of rapidly identifying drug-related material in complex biological matrices. Two semi-automated commercial software applications, MetabolitePilot™ and Mass-MetaSite™, were evaluated to assess the relative speed and accuracy of structural assignments using data generated on a high-resolution MS platform. Review of these applications has demonstrated their utility in providing accurate results in a time-efficient manner, leading to acceleration of metabolite identification initiatives while highlighting the continued need for biotransformation expertise in the interpretation of more complex metabolic reactions.
ERIC Educational Resources Information Center
Science and Children, 1988
1988-01-01
Reviews five software packages for use with school age children. Includes "Science Toolkit Module 2: Earthquake Lab"; "Adaptations and Identification"; "Geoworld"; "Body Systems II Series: The Blood System: A Liquid of Life," all for Apple II, and "Science Courseware: Life Science/Biology" for…
Virginio, Luiz A; Ricarte, Ivan Luiz Marques
2015-01-01
Although Electronic Health Records (EHR) can offer benefits to the health care process, there is a growing body of evidence that these systems can also incur risks to patient safety when developed or used improperly. This work is a literature review to identify these risks from a software quality perspective. Therefore, the risks were classified based on the ISO/IEC 25010 software quality model. The risks identified were related mainly to the characteristics of "functional suitability" (i.e., software bugs) and "usability" (i.e., interface prone to user error). This work elucidates the fact that EHR quality problems can adversely affect patient safety, resulting in errors such as incorrect patient identification, incorrect calculation of medication dosages, and lack of access to patient data. Therefore, the risks presented here provide the basis for developers and EHR regulating bodies to pay attention to the quality aspects of these systems that can result in patient harm.
Proteomics of exhaled breath: methodological nuances and pitfalls.
Kurova, Viktoria S; Anaev, Eldar C; Kononikhin, Alexey S; Fedorchenko, Kristina Yu; Popov, Igor A; Kalupov, Timothey L; Bratanov, Dmitriy O; Nikolaev, Eugenie N; Varfolomeev, Sergey D
2009-01-01
The analysis of exhaled breath condensate (EBC) can be an alternative to traditional endoscopic sampling of lower respiratory tract secretions. This is a simple non-invasive method of diagnosing respiratory diseases, in particular, respiratory inflammatory processes. Samples were collected with a special device-condenser (ECoScreen, VIASYS Healthcare, Germany), then treated with trypsin according to the proteomics protocol for standard protein mixtures and analyzed by nanoflow high-performance liquid chromatography tandem mass spectrometry (HPLC-MS/MS) with a 7-Tesla Finnigan LTQ-FT mass spectrometer (Thermo Electron, Germany). Mascot software (Matrixscience) was used for screening the database NCBInr for proteins corresponding to the peptide maps that were obtained. EBCs from 17 young healthy non-smoking donors were collected. Different methods for concentrating protein were compared in order to optimize EBC preparations for proteomic analysis. The procedure that was chosen allowed identification of proteins exhaled by healthy people. The major proteins in the condensates were cytoskeletal keratins. Another 12 proteins were identified in EBC from healthy non-smokers. Some keratins were found in the ambient air and may be considered exogenous components of exhaled air. Knowledge of the normal proteome of exhaled breath allows one to look for biomarkers of different disease states in EBC. Proteins in ambient air can be identified in the respiratory tract and should be excluded from the analysis of the proteome of EBC. The results obtained allowed us to choose the most effective procedure of sample preparation when working with samples containing very low protein concentrations.
Lange, Vinzenz; Malmström, Johan A; Didion, John; King, Nichole L; Johansson, Björn P; Schäfer, Juliane; Rameseder, Jonathan; Wong, Chee-Hong; Deutsch, Eric W; Brusniak, Mi-Youn; Bühlmann, Peter; Björck, Lars; Domon, Bruno; Aebersold, Ruedi
2008-08-01
In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure.
Heidler, Juliana; Hardt, Stefanie; Wittig, Ilka; Tegeder, Irmgard
2016-12-01
Progranulin deficiency is associated with neurodegeneration in humans and in mice. The mechanisms likely involve progranulin-promoted removal of protein waste via autophagy. We performed a deep proteomic screen of the pre-frontal cortex in aged (13-15 months) female progranulin-deficient mice (GRN -/- ) and mice with inducible neuron-specific overexpression of progranulin (SLICK-GRN-OE) versus the respective control mice. Proteins were extracted and analyzed per liquid chromatography/mass spectrometry (LC/MS) on a Thermo Scientific™ Q Exactive Plus equipped with an ultra-high performance liquid chromatography unit and a Nanospray Flex Ion-Source. Full Scan MS-data were acquired using Xcalibur and raw files were analyzed using the proteomics software Max Quant. The mouse reference proteome set from uniprot (June 2015) was used to identify peptides and proteins. The DiB data file is a reduced MaxQuant output and includes peptide and protein identification, accession numbers, protein and gene names, sequence coverage and label free quantification (LFQ) values of each sample. Differences in protein expression in genotypes are presented in "Progranulin overexpression in sensory neurons attenuates neuropathic pain in mice: Role of autophagy" (C. Altmann, S. Hardt, C. Fischer, J. Heidler, H.Y. Lim, A. Haussler, B. Albuquerque, B. Zimmer, C. Moser, C. Behrends, F. Koentgen, I. Wittig, M.H. Schmidt, A.M. Clement, T. Deller, I. Tegeder, 2016) [1].
Lee, Jinoo; Valkova, Nelly; White, Mark P; Kültz, Dietmar
2006-09-01
We used dogfish shark (Squalus acanthias) as a model for proteome analysis of six different tissues to evaluate tissue-specific protein expression on a global scale and to deduce specific functions and the relatedness of multiple tissues from their proteomes. Proteomes of heart, brain, kidney, intestine, gill, and rectal gland were separated by two-dimensional gel electrophoresis (2DGE), gel images were matched using Delta 2D software and then evaluated for tissue-specific proteins. Sixty-one proteins (4%) were found to be in only a single type of tissue and 535 proteins (36%) were equally abundant in all six tissues. Relatedness between tissues was assessed based on tissue-specific expression patterns of all 1465 consistently resolved protein spots. This analysis revealed that tissues with osmoregulatory function (kidney, intestine, gill, rectal gland) were more similar in their overall proteomes than non-osmoregulatory tissues (heart, brain). Sixty-one proteins were identified by MALDI-TOF/TOF mass spectrometry and biological functions characteristic of osmoregulatory tissues were derived from gene ontology and molecular pathway analysis. Our data demonstrate that the molecular machinery for energy and urea metabolism and the Rho-GTPase/cytoskeleton pathway are enriched in osmoregulatory tissues of sharks. Our work provides a strong rationale for further study of the contribution of these mechanisms to the osmoregulation of marine sharks.
Find Pairs: The Module for Protein Quantification of the PeakQuant Software Suite
Eisenacher, Martin; Kohl, Michael; Wiese, Sebastian; Hebeler, Romano; Meyer, Helmut E.
2012-01-01
Abstract Accurate quantification of proteins is one of the major tasks in current proteomics research. To address this issue, a wide range of stable isotope labeling techniques have been developed, allowing one to quantitatively study thousands of proteins by means of mass spectrometry. In this article, the FindPairs module of the PeakQuant software suite is detailed. It facilitates the automatic determination of protein abundance ratios based on the automated analysis of stable isotope-coded mass spectrometric data. Furthermore, it implements statistical methods to determine outliers due to biological as well as technical variance of proteome data obtained in replicate experiments. This provides an important means to evaluate the significance in obtained protein expression data. For demonstrating the high applicability of FindPairs, we focused on the quantitative analysis of proteome data acquired in 14N/15N labeling experiments. We further provide a comprehensive overview of the features of the FindPairs software, and compare these with existing quantification packages. The software presented here supports a wide range of proteomics applications, allowing one to quantitatively assess data derived from different stable isotope labeling approaches, such as 14N/15N labeling, SILAC, and iTRAQ. The software is publicly available at http://www.medizinisches-proteom-center.de/software and free for academic use. PMID:22909347
Zhu, Ying; Zhao, Rui; Piehowski, Paul D.; ...
2017-09-01
One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here in this study, we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-μm-i.d. columns increase signal intensity by >3-fold relative to those using 75-μm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos MS significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap),more » leading to a ~3-fold increase in peptide identifications and 1.7-fold increase in identified protein groups for 2 ng tryptic digests of the bacterium S. oneidensis. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~95% for 0.5 ng samples and by ~42% for 2 ng samples. Using the best combination of the above variables, we were able to identify >3,000 proteins from 10 ng tryptic digests from both HeLa and THP-1 mammalian cell lines. We also identified >950 proteins from subnanogram archaeal/bacterial cocultures. Finally, the present ultrasensitive LC-MS platform achieves a level of proteome coverage not previously realized for ultra-small sample loadings, and is expected to facilitate the analysis of subnanogram samples, including single mammalian cells.« less
CrossTalk: The Journal of Defense Software Engineering. Volume 18, Number 11
2005-11-01
languages. Our discipline of software engineering has really experienced phenomenal growth right before our eyes. A sign that software design has...approach on a high level of abstraction. The main emphasis is on the identification and allocation of a needed functionality (e.g., a target tracker ), rather...messaging software that is the backbone of teenage culture. As increasing security constraints will increase the cost of developing and main- taining any
NASA Technical Reports Server (NTRS)
Jester, Peggy L.; Hancock, David W., III
1999-01-01
This document provides the Data Management Plan for the GLAS Standard Data Software (SDS) supporting the GLAS instrument of the EOS ICESat Spacecraft. The SDS encompasses the ICESat Science Investigator-led Processing System (I-SIPS) Software and the Instrument Support Facility (ISF) Software. This Plan addresses the identification, authority, and description of the interface nodes associated with the GLAS Standard Data Products and the GLAS Ancillary Data.
Hertveldt, Kirsten; Beliën, Tim; Volckaert, Guido
2009-01-01
In M13 phage display, proteins and peptides are exposed on one of the surface proteins of filamentous phage particles and become accessible to affinity enrichment against a bait of interest. We describe the construction of fragmented whole genome and gene fragment phage display libraries and interaction selection by panning. This strategy allows the identification and characterization of interacting proteins on a genomic scale by screening the fragmented "proteome" against protein baits. Gene fragment libraries allow a more in depth characterization of the protein-protein interaction site by identification of the protein region involved in the interaction.
Comparison of identification methods for oral asaccharolytic Eubacterium species.
Wade, W G; Slayne, M A; Aldred, M J
1990-12-01
Thirty one strains of oral, asaccharolytic Eubacterium spp. and the type strains of E. brachy, E. nodatum and E. timidum were subjected to three identification techniques--protein-profile analysis, determination of metabolic end-products, and the API ATB32A identification kit. Five clusters were obtained from numerical analysis of protein profiles and excellent correlations were seen with the other two methods. Protein profiles alone allowed unequivocal identification.
Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.
Wang, Jianqi; Zhang, Yajie; Yu, Yonghao
2015-07-01
A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.
Caititu: a tool to graphically represent peptide sequence coverage and domain distribution.
Carvalho, Paulo C; Junqueira, Magno; Valente, Richard H; Domont, Gilberto B
2008-10-07
Here we present Caititu, an easy-to-use proteomics software to graphically represent peptide sequence coverage and domain distribution for different correlated samples (e.g. originated from 2D gel spots) relatively to the full-sequence of the known protein they are related to. Although Caititu has a broad applicability, we exemplify its usefulness in Toxinology using snake venom as a model. For example, proteolytic processing may lead to inactivation or loss of domains. Therefore, our proposed graphic representation for peptides identified by two dimensional electrophoresis followed by mass spectrometric identification of excised spots can aid in inferring what kind of processing happened to the toxins, if any. Caititu is freely available to download at: http://pcarvalho.com/things/caititu.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.
Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).
The Monitoring and Affinity Purification of Proteins Using Dual Tags with Tetracysteine Motifs
NASA Astrophysics Data System (ADS)
Giannone, Richard J.; Liu, Yie; Wang, Yisong
Identification and characterization of protein-protein interaction networks is essential for the elucidation of biochemical mechanisms and cellular function. Affinity purification in combination with liquid chromatography-tandem mass spectrometry (LC-MS/MS) has emerged as a very powerful tactic for the identification of specific protein-protein interactions. In this chapter, we describe a comprehensive methodology that uses our recently developed dual-tag affinity purification system for the enrichment and identification of mammalian protein complexes. The protocol covers a series of separate but sequentially related techniques focused on the facile monitoring and purification of a dual-tagged protein of interest and its interacting partners via a system built with tetracysteine motifs and various combinations of affinity tags. Using human telomeric repeat binding factor 2 (TRF2) as an example, we demonstrate the power of the system in terms of bait protein recovery after dual-tag affinity purification, detection of bait protein subcellular localization and expression, and successful identification of known and potentially novel TRF2 interacting proteins. Although the protocol described here has been optimized for the identification and characterization of TRF2-associated proteins, it is, in principle, applicable to the study of any other mammalian protein complexes that may be of interest to the research community.
Drainage identification analysis and mapping, phase 2 : technical brief.
DOT National Transportation Integrated Search
2017-01-01
This research studied, tested and rectified the compatibility issue related to the recent upgrades of : NJDOT vendor inspection software, and uploaded all collected data to make Drainage Identification : Analysis and Mapping System (DIAMS) current an...
Yoo, Chul; Patwa, Tasneem H.; Kreunin, Paweena; Miller, Fred R.; Huber, Christian G.; Nesvizhskii, Alexey I.; Lubman, David M.
2012-01-01
A comprehensive platform that integrates information from the protein and peptide levels by combining various MS techniques has been employed for the analysis of proteins in fully malignant human breast cancer cells. The cell lysates were subjected to chromatofocusing fractionation, followed by tryptic digestion of pH fractions for on-line monolithic RP-HPLC interfaced with linear ion trap MS analysis for rapid protein identification. This unique approach of direct analysis of pH fractions resulted in the identification of large numbers of proteins from several selected pH fractions, in which approximately 1.5 μg of each of the pH fraction digests was consumed for an analysis time of ca 50 min. In order to combine valuable information retained at the protein level with the protein identifications obtained from the peptide level information, the same pH fraction was analyzed using nonporous (NPS)-RP-HPLC/ESI-TOF MS to obtain intact protein MW measurements. In order to further validate the protein identification procedures from the fraction digest analysis, NPS-RP-HPLC separation was performed for off-line protein collection to closely examine each protein using MALDI-TOF MS and MALDI-quadrupole ion trap (QIT)-TOF MS, and excellent agreement of protein identifications was consistently observed. It was also observed that the comparison to intact MW and other MS information was particularly useful for analyzing proteins whose identifications were suggested by one sequenced peptide from fraction digest analysis. PMID:17206599
Practical and Efficient Searching in Proteomics: A Cross Engine Comparison
Paulo, Joao A.
2014-01-01
Background Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. Methods A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. Results The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. Conclusions The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort. PMID:25346847
Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.
Paulo, Joao A
2013-10-01
Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.
Yeast proteome map (last update).
Perrot, Michel; Moes, Suzette; Massoni, Aurélie; Jenoe, Paul; Boucherie, Hélian
2009-10-01
The identification of proteins separated on 2-D gels is essential to exploit the full potential of 2-D gel electrophoresis for proteomic investigations. For this purpose we have undertaken the systematic identification of Saccharomyces cerevisiae proteins separated on 2-D gels. We report here the identification by mass spectrometry of 100 novel yeast protein spots that have so far not been tackled due to their scarcity on our standard 2-D gels. These identifications extend the number of protein spots identified on our yeast 2-D proteome map to 716. They correspond to 485 unique proteins. Among these, 154 were resolved into several isoforms. The present data set can now be expanded to report for the first time a map of 363 protein isoforms that significantly deepens our knowledge of the yeast proteome. The reference map and a list of all identified proteins can be accessed on the Yeast Protein Map server (www.ibgc.u-bordeaux2.fr/YPM).
Baracat-Pereira, Maria Cristina; de Oliveira Barbosa, Meire; Magalhães, Marcos Jorge; Carrijo, Lanna Clicia; Games, Patrícia Dias; Almeida, Hebréia Oliveira; Sena Netto, José Fabiano; Pereira, Matheus Rodrigues; de Barros, Everaldo Gonçalves
2012-06-01
The enrichment and isolation of proteins are considered limiting steps in proteomic studies. Identification of proteins whose expression is transient, those that are of low-abundance, and of natural peptides not described in databases, is still a great challenge. Plant extracts are in general complex, and contaminants interfere with the identification of proteins involved in important physiological processes, such as plant defense against pathogens. This review discusses the challenges and strategies of separomics applied to the identification of low-abundance proteins and peptides in plants, especially in plants challenged by pathogens. Separomics is described as a group of methodological strategies for the separation of protein molecules for proteomics. Several tools have been used to remove highly abundant proteins from samples and also non-protein contaminants. The use of chromatographic techniques, the partition of the proteome into subproteomes, and an effort to isolate proteins in their native form have allowed the isolation and identification of rare proteins involved in different processes.
Baracat-Pereira, Maria Cristina; de Oliveira Barbosa, Meire; Magalhães, Marcos Jorge; Carrijo, Lanna Clicia; Games, Patrícia Dias; Almeida, Hebréia Oliveira; Sena Netto, José Fabiano; Pereira, Matheus Rodrigues; de Barros, Everaldo Gonçalves
2012-01-01
The enrichment and isolation of proteins are considered limiting steps in proteomic studies. Identification of proteins whose expression is transient, those that are of low-abundance, and of natural peptides not described in databases, is still a great challenge. Plant extracts are in general complex, and contaminants interfere with the identification of proteins involved in important physiological processes, such as plant defense against pathogens. This review discusses the challenges and strategies of separomics applied to the identification of low-abundance proteins and peptides in plants, especially in plants challenged by pathogens. Separomics is described as a group of methodological strategies for the separation of protein molecules for proteomics. Several tools have been used to remove highly abundant proteins from samples and also non-protein contaminants. The use of chromatographic techniques, the partition of the proteome into subproteomes, and an effort to isolate proteins in their native form have allowed the isolation and identification of rare proteins involved in different processes. PMID:22802713
Li, Nan; Han, Zhenzhen; Li, Lin; Zhang, Bing; Liu, Zhidong; Li, Jiawei
2018-01-01
The objective of this study was to investigate the effects of the solid lipid nanoparticles of baicalin (BA-SLNs) on an experimental cataract model and explore the molecular mechanism combined with bioinformatics analysis. The transparency of lens was observed daily by slit-lamp and photography. Lenticular opacity was graded. Two-dimensional gel electrophoresis (2-DE) was employed to analyze the differential protein expression modes in each group. Proteins of interest were subjected to protein identification by nano-liquid chromatography tandem mass spectrometry (LC-MS/MS). Bioinformatics analysis was performed using the Ingenuity Pathway Analysis (IPA) online software to comprehend the biological implications of the proteins identified by proteomics. At the end of the sodium selenite-induced cataract progression, almost all lenses from the model group developed partial nuclear opacity; however, all lenses were clear and normal in the blank group. There was no significant difference between the BA-SLNs group and the blank group. Many protein spots were differently expressed in 2-DE patterns of total proteins of lenses from each group, and 65 highly different protein spots were selected to be identified between the BA-SLNs group and the model group. A total of 23 proteins were identified, and 12 of which were crystalline proteins. We considered crystalline proteins to play important roles in preserving the normal expression levels of proteins and the transparency of lenses. The general trend in the BA-SLN-treated lenses' data showed that BA-SLNs regulated the protein expression mode of cataract lenses to normal lenses. Our findings suggest that BA-SLNs may be a potential therapeutic agent in treating cataract by regulating protein expression and may also be a strong candidate for future clinical research.
The alpha-fetoprotein (AFP) third domain: a search for AFP interaction sites of cell cycle proteins.
Mizejewski, G J
2016-09-01
The carboxy-terminal third domain of alpha-fetoprotein (AFP-3D) is known to harbor binding and/or interaction sites for hydrophobic ligands, receptors, and binding proteins. Such reports have established that AFP-3D consists of amino acid (AA) sequence stretches on the AFP polypeptide that engages in protein-to-protein interactions with various ligands and receptors. Using a computer software program specifically designed for such interactions, the present report identified AA sequence fragments on AFP-3D that could potentially interact with a variety of cell cycle proteins. The cell cycle proteins identified were (1) cyclins, (2) cyclin-dependent kinases, (3) cell cycle-associated proteins (inhibitors, checkpoints, initiators), and (4) ubiquitin ligases. Following detection of the AFP-3D to cell cycle protein interaction sites, the computer-derived AFP localization AA sequences were compared and aligned with previously reported hydrophobic ligand and receptor interaction sites on AFP-3D. A literature survey of the association of cell cycle proteins with AFP showed both positive relationships and correlations. Previous reports of experimental AFP-derived peptides effects on various cell cycle proteins served to confirm and verify the present computer cell cycle protein identifications. Cell cycle protein interactions with AFP-CD peptides have been reported in cultured MCF-7 breast cancer cells subjected to mRNA microarray analysis. After 7 days in culture with MCF-7 cells, the AFP-derived peptides were shown to downregulate cyclin E, SKP2, checkpoint suppressors, cyclin-dependent kinases, and ubiquitin ligases that modulate cyclin E/CdK2 transition from the G1 to the S-phase of the cell cycle. Thus, the experimental data on AFP-CD interaction with cell cycle proteins were consistent with the "in silico" findings.
Saha, Tanumoy; Rathmann, Isabel; Galic, Milos
2017-07-11
Filopodia are dynamic, finger-like cellular protrusions associated with migration and cell-cell communication. In order to better understand the complex signaling mechanisms underlying filopodial initiation, elongation and subsequent stabilization or retraction, it is crucial to determine the spatio-temporal protein activity in these dynamic structures. To analyze protein function in filopodia, we recently developed a semi-automated tracking algorithm that adapts to filopodial shape-changes, thus allowing parallel analysis of protrusion dynamics and relative protein concentration along the whole filopodial length. Here, we present a detailed step-by-step protocol for optimized cell handling, image acquisition and software analysis. We further provide instructions for the use of optional features during image analysis and data representation, as well as troubleshooting guidelines for all critical steps along the way. Finally, we also include a comparison of the described image analysis software with other programs available for filopodia quantification. Together, the presented protocol provides a framework for accurate analysis of protein dynamics in filopodial protrusions using image analysis software.
NASA Astrophysics Data System (ADS)
Zima, W.
2008-12-01
FAMIAS (Frequency Analysis and Mode Identification for AsteroSeismology) is a collection of state-of-the-art software tools for the analysis of photometric and spectroscopic time series data. It is one of the deliverables of the Work Package NA5: Asteroseismology of the European Coordination Action in Helio- and Asteroseismology (HELAS1 ). Two main sets of tools are incorporated in FAMIAS. The first set allows to search for pe- riodicities in the data using Fourier and non-linear least-squares fitting algorithms. The other set allows to carry out a mode identification for the detected pulsation frequencies to deter- mine their pulsational quantum numbers, the harmonic degree, ℓ, and the azimuthal order, m. For the spectroscopic mode identification, the Fourier parameter fit method and the moment method are available. The photometric mode identification is based on pre-computed grids of atmospheric parameters and non-adiabatic observables, and uses the method of amplitude ratios and phase differences in different filters. The types of stars to which FAMIAS is appli- cable are main-sequence pulsators hotter than the Sun. This includes the Gamma Dor stars, Delta Sct stars, the slowly pulsating B stars and the Beta Cep stars - basically all pulsating main-sequence stars, for which empirical mode identification is required to successfully carry out asteroseismology. The complete manual for FAMIAS is published in a special issue of Communications in Asteroseismology, Vol 155. The homepage of FAMIAS2 provides the possibility to download the software and to read the on-line documentation.
NASA Astrophysics Data System (ADS)
Susnea, Iuliana; Bunk, Sebastian; Wendel, Albrecht; Hermann, Corinna; Przybylski, Michael
2011-04-01
We report here an affinity-proteomics approach that combines 2D-gel electrophoresis and immunoblotting with high performance mass spectrometry to the identification of both full length protein antigens and antigenic fragments of Chlamydophila pneumoniae (C. pneumoniae). The present affinity-mass spectrometry approach effectively utilized high resolution FTICR mass spectrometry and LC-tandem-MS for protein identification, and enabled the identification of several new highly antigenic C. pneumoniae proteins that were not hitherto reported or previously detected only in other Chlamydia species, such as Chlamydia trachomatis. Moreover, high resolution affinity-MS provided the identification of several neo-antigenic protein fragments containing N- and C-terminal, and central domains such as fragments of the membrane protein Pmp21 and the secreted chlamydial proteasome-like factor (Cpaf), representing specific biomarker candidates.
Martin, Laetitia B B; Sherwood, Robert W; Nicklay, Joshua J; Yang, Yong; Muratore-Schroeder, Tara L; Anderson, Elizabeth T; Thannhauser, Theodore W; Rose, Jocelyn K C; Zhang, Sheng
2016-08-01
We describe here the use of label-free wide selected-ion monitoring data-independent acquisition (WiSIM-DIA) to identify proteins that are involved in the formation of tomato (Solanum lycopersicum) fruit cuticles and that are regulated by the transcription factor CUTIN DEFICIENT2 (CD2). A spectral library consisting of 11 753 unique peptides, corresponding to 2338 tomato protein groups, was used and the DIA analysis was performed at the MS1 level utilizing narrow mass windows for extraction with Skyline 2.6 software. We identified a total of 1140 proteins, 67 of which had expression levels that differed significantly between the cd2 tomato mutant and the wild-type cultivar M82. Differentially expressed proteins including a key protein involved in cutin biosynthesis, were selected for validation by target SRM/MRM and by Western blot analysis. In addition to confirming a role for CD2 in regulating cuticle formation, the results also revealed that CD2 influences pathways associated with cell wall biology, anthocyanin biosynthesis, plant development, and responses to stress, which complements findings of earlier RNA-Seq experiments. Our results provide new insights into molecular processes and aspects of fruit biology associated with CD2 function, and demonstrate that the WiSIM-DIA is an effective quantitative approach for global protein identifications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Takakusagi, Yoichi; Kuramochi, Kouji; Takagi, Manami; Kusayanagi, Tomoe; Manita, Daisuke; Ozawa, Hiroko; Iwakiri, Kanako; Takakusagi, Kaori; Miyano, Yuka; Nakazaki, Atsuo; Kobayashi, Susumu; Sugawara, Fumio; Sakaguchi, Kengo
2008-11-15
Here, we report an efficient one-cycle affinity selection using a natural-protein or random-peptide T7 phage pool for identification of binding proteins or peptides specific for small-molecules. The screening procedure involved a cuvette type 27-MHz quartz-crystal microbalance (QCM) apparatus with introduction of self-assembled monolayer (SAM) for a specific small-molecule immobilization on the gold electrode surface of a sensor chip. Using this apparatus, we attempted an affinity selection of proteins or peptides against synthetic ligand for FK506-binding protein (SLF) or irinotecan (Iri, CPT-11). An affinity selection using SLF-SAM and a natural-protein T7 phage pool successfully detected FK506-binding protein 12 (FKBP12)-displaying T7 phage after an interaction time of only 10 min. Extensive exploration of time-consuming wash and/or elution conditions together with several rounds of selection was not required. Furthermore, in the selection using a 15-mer random-peptide T7 phage pool and subsequent analysis utilizing receptor ligand contact (RELIC) software, a subset of SLF-selected peptides clearly pinpointed several amino-acid residues within the binding site of FKBP12. Likewise, a subset of Iri-selected peptides pinpointed part of the positive amino-acid region of residues from the Iri-binding site of the well-known direct targets, acetylcholinesterase (AChE) and carboxylesterase (CE). Our findings demonstrate the effectiveness of this method and general applicability for a wide range of small-molecules.
MPFit: Computational Tool for Predicting Moonlighting Proteins.
Khan, Ishita; McGraw, Joshua; Kihara, Daisuke
2017-01-01
An increasing number of proteins have been found which are capable of performing two or more distinct functions. These proteins, known as moonlighting proteins, have drawn much attention recently as they may play critical roles in disease pathways and development. However, because moonlighting proteins are often found serendipitously, our understanding of moonlighting proteins is still quite limited. In order to lay the foundation for systematic moonlighting proteins studies, we developed MPFit, a software package for predicting moonlighting proteins from their omics features including protein-protein and gene interaction networks. Here, we describe and demonstrate the algorithm of MPFit, the idea behind it, and provide instruction for using the software.
Identification of secreted bacterial proteins by noncanonical amino acid tagging
Mahdavi, Alborz; Szychowski, Janek; Ngo, John T.; Sweredoski, Michael J.; Graham, Robert L. J.; Hess, Sonja; Schneewind, Olaf; Mazmanian, Sarkis K.; Tirrell, David A.
2014-01-01
Pathogenic microbes have evolved complex secretion systems to deliver virulence factors into host cells. Identification of these factors is critical for understanding the infection process. We report a powerful and versatile approach to the selective labeling and identification of secreted pathogen proteins. Selective labeling of microbial proteins is accomplished via translational incorporation of azidonorleucine (Anl), a methionine surrogate that requires a mutant form of the methionyl-tRNA synthetase for activation. Secreted pathogen proteins containing Anl can be tagged by azide-alkyne cycloaddition and enriched by affinity purification. Application of the method to analysis of the type III secretion system of the human pathogen Yersinia enterocolitica enabled efficient identification of secreted proteins, identification of distinct secretion profiles for intracellular and extracellular bacteria, and determination of the order of substrate injection into host cells. This approach should be widely useful for the identification of virulence factors in microbial pathogens and the development of potential new targets for antimicrobial therapy. PMID:24347637
Paramanik, Vijay; Thakur, Mahendra Kumar
2012-01-01
The localization of estrogen receptor (ER)β in mitochondria suggests ERβ-dependent regulation of genes, which is poorly understood. Here, we analyzed the ERβ interacting mitochondrial as well as nuclear proteins in mouse brain using pull-down assay and matrix-assisted laser desorption ionization mass spectroscopy (MALDI-MS). In the case of mitochondria, ERβ interacted with six proteins of 35–152 kDa, its transactivation domain (TAD) interacted with four proteins of 37–172 kDa, and ligand binding domain (LBD) interacted with six proteins of 37–161 kDa. On the other hand, in nuclei, ERβ interacted with seven proteins of 30–203 kDa, TAD with ten proteins of 31–160 kDa, and LBD with fourteen proteins of 42–179 kDa. For further identification, these proteins were cleaved by trypsin into peptides and analyzed by MALDI-MS using mascot search engine, immunoprecipitation, immunoblotting, and far-Western blotting. To find the consensus binding motifs in interacting proteins, their unique tryptic peptides were analyzed by the motif scan software. All the interacting proteins were found to contain casein kinase (CK) 2, phosphokinase (PK)C phosphorylation, and N-myristoylation sites. These were further confirmed by peptide pull-down assays using specific mutations in the interacting sites. Thus, the present findings provide evidence for the interaction of ERβ with specific mitochondrial and nuclear proteins through consensus CK2, PKC phosphorylation, and N-myristoylation sites, and may represent an essential step toward designing selective ER modulators for regulating estrogen-mediated signaling. PMID:22566700
SELF-BLM: Prediction of drug-target interactions via self-training SVM.
Keum, Jongsoo; Nam, Hojung
2017-01-01
Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM.
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
2018-07-01
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.
Cheon, M S; Kim, S H; Fountoulakis, M; Lubec, G
2003-01-01
Fatty acid binding proteins (FABPs) are thought to play a role in the binding, targeting and transport of long-chain fatty acids, and at least three types of FABPs are found in human brain; heart type (H)-FABP, brain type (B)-FABP and epidermal type (E)-FABP. Although all three FABPs could be involved in normal brain function in prenatal and postnatal life, a neurobiological role of FABPs in neurodegenerative diseases has not been reported yet. These made us evaluate the protein levels of FABPs in brains from patients with Down syndrome (DS) and Alzheimer's disease (AD) and fetal cerebral cortex with DS using two-dimensional (2-D) gel electrophoresis with subsequent matrix-assisted laser desorption ionization mass spectroscopy (MALDI-MS) identification and specific software for quantification of proteins. In adult brain, B-FABP was significantly increased in occipital cortex of DS, and H-FABP was significantly decreased in DS (frontal, occipital and parietal cortices) and AD (frontal, temporal, occipital and parietal cortices). In fetal brain, B-FABP and epidermal E-FABP levels were comparable in controls and DS. We conclude that aberrant expression of FABPs, especially H-FABP may alter membrane fluidity and signal transduction, and consequently could be involved in cellular dysfunction in neurodegenerative disorders.
48 CFR 209.571-6 - Identification of organizational conflicts of interest.
Code of Federal Regulations, 2014 CFR
2014-10-01
... business units performing systems engineering and technical assistance, professional services, or... parent corporate entity, particularly the award of a subcontract for software integration or the development of a proprietary software system architecture; and (c) The performance by, or assistance of...
48 CFR 209.571-6 - Identification of organizational conflicts of interest.
Code of Federal Regulations, 2013 CFR
2013-10-01
... business units performing systems engineering and technical assistance, professional services, or... parent corporate entity, particularly the award of a subcontract for software integration or the development of a proprietary software system architecture; and (c) The performance by, or assistance of...
48 CFR 209.571-6 - Identification of organizational conflicts of interest.
Code of Federal Regulations, 2012 CFR
2012-10-01
... business units performing systems engineering and technical assistance, professional services, or... parent corporate entity, particularly the award of a subcontract for software integration or the development of a proprietary software system architecture; and (c) The performance by, or assistance of...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baumgardt, D.R.; Carter, S.; Maxson, M.
The objective of this project is to design and develop an Intelligent Event Identification System, or ISEIS, which will be a prototype for routine event identification of small explosions and earthquakes and to serve as a tool for discrimination research. The first part of this study gives an overview of the system design and the results of a preliminary evaluation of the system on events in Scandinavia and the Soviet Union. The system was designed to be highly modular to allow the easy incorporation of new discriminants and/or discrimination processes. Because the main objective of the system is the identificationmore » of small events, most of the initial ISEIS prototype discriminants utilize regional seismic data recorded by the regional arrays, NORESS and ARCESS. However, ISEIS can easily process other regional array data (e.g., from GERESS and FINESA), as well as data from three-component single stations, as more of this data becomes available. The second part of this study is entitled Intelligent Event Identification System: User's Manual, and gives a detailed description of all the processing interfaces of ISEIS. The third part of this study is entitled Intelligent Event Identification System: Software Maintenance Manual, which describes the ISEIS software from the programmer's perspective and provides information for maintenance and modification of the software modules in the system.« less
Ban, Tomohiro; Ohue, Masahito; Akiyama, Yutaka
2018-04-01
The identification of comprehensive drug-target interactions is important in drug discovery. Although numerous computational methods have been developed over the years, a gold standard technique has not been established. Computational ligand docking and structure-based drug design allow researchers to predict the binding affinity between a compound and a target protein, and thus, they are often used to virtually screen compound libraries. In addition, docking techniques have also been applied to the virtual screening of target proteins (inverse docking) to predict target proteins of a drug candidate. Nevertheless, a more accurate docking method is currently required. In this study, we proposed a method in which a predicted ligand-binding site is covered by multiple grids, termed multiple grid arrangement. Notably, multiple grid arrangement facilitates the conformational search for a grid-based ligand docking software and can be applied to the state-of-the-art commercial docking software Glide (Schrödinger, LLC). We validated the proposed method by re-docking with the Astex diverse benchmark dataset and blind binding site situations, which improved the correct prediction rate of the top scoring docking pose from 27.1% to 34.1%; however, only a slight improvement in target prediction accuracy was observed with inverse docking scenarios. These findings highlight the limitations and challenges of current scoring functions and the need for more accurate docking methods. The proposed multiple grid arrangement method was implemented in Glide by modifying a cross-docking script for Glide, xglide.py. The script of our method is freely available online at http://www.bi.cs.titech.ac.jp/mga_glide/. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
López-Fernández, Hugo; Araújo, José E; Jorge, Susana; Glez-Peña, Daniel; Reboiro-Jato, Miguel; Santos, Hugo M; Fdez-Riverola, Florentino; Capelo, José L
2018-03-01
2D-gel electrophoresis is widely used in combination with MALDI-TOF mass spectrometry in order to analyze the proteome of biological samples. For instance, it can be used to discover proteins that are differentially expressed between two groups (e.g. two disease conditions, case vs. control, etc.) thus obtaining a set of potential biomarkers. This procedure requires a great deal of data processing in order to prepare data for analysis or to merge and integrate data from different sources. This kind of work is usually done manually (e.g. copying and pasting data into spreadsheet files), which is highly time consuming and distracts the researcher from other important, core tasks. Moreover, engaging in a repetitive process in a non-automated, handling-based manner is prone to error, thus threatening reliability and reproducibility. The objective of this paper is to present S2P, an open source software to overcome these drawbacks. S2P is implemented in Java on top of the AIBench framework, and relies on well-established open source libraries to accomplish different tasks. S2P is an AIBench based desktop multiplatform application, specifically aimed to process 2D-gel and MALDI-mass spectrometry protein identification-based data in a computer-aided, reproducible manner. Different case studies are presented in order to show the usefulness of S2P. S2P is open source and free to all users at http://www.sing-group.org/s2p. Through its user-friendly GUI interface, S2P dramatically reduces the time that researchers need to invest in order to prepare data for analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
PROTICdb: a web-based application to store, track, query, and compare plant proteome data.
Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann
2005-05-01
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
ERIC Educational Resources Information Center
Wilson, Karl A.; Tan-Wilson, Anna
2013-01-01
Mass spectrometry (MS) has become an important tool in studying biological systems. One application is the identification of proteins and peptides by the matching of peptide and peptide fragment masses to the sequences of proteins in protein sequence databases. Often prior protein separation of complex protein mixtures by 2D-PAGE is needed,…
Top Down Implementation Plan for system performance test software
NASA Technical Reports Server (NTRS)
Jacobson, G. N.; Spinak, A.
1982-01-01
The top down implementation plan used for the development of system performance test software during the Mark IV-A era is described. The plan is based upon the identification of the hierarchical relationship of the individual elements of the software design, the development of a sequence of functionally oriented demonstrable steps, the allocation of subroutines to the specific step where they are first required, and objective status reporting. The results are: determination of milestones, improved managerial visibility, better project control, and a successful software development.
Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.
Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis
2017-01-01
Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.
Bieri, Michael; d'Auvergne, Edward J; Gooley, Paul R
2011-06-01
Investigation of protein dynamics on the ps-ns and μs-ms timeframes provides detailed insight into the mechanisms of enzymes and the binding properties of proteins. Nuclear magnetic resonance (NMR) is an excellent tool for studying protein dynamics at atomic resolution. Analysis of relaxation data using model-free analysis can be a tedious and time consuming process, which requires good knowledge of scripting procedures. The software relaxGUI was developed for fast and simple model-free analysis and is fully integrated into the software package relax. It is written in Python and uses wxPython to build the graphical user interface (GUI) for maximum performance and multi-platform use. This software allows the analysis of NMR relaxation data with ease and the generation of publication quality graphs as well as color coded images of molecular structures. The interface is designed for simple data analysis and management. The software was tested and validated against the command line version of relax.
A combinatorial perspective of the protein inference problem.
Yang, Chao; He, Zengyou; Yu, Weichuan
2013-01-01
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Tolić, Nikola; Liu, Yina; Liyu, Andrey; Shen, Yufeng; Tfaily, Malak M; Kujawinski, Elizabeth B; Longnecker, Krista; Kuo, Li-Jung; Robinson, Errol W; Paša-Tolić, Ljiljana; Hess, Nancy J
2017-12-05
Ultrahigh resolution mass spectrometry, such as Fourier transform ion cyclotron resonance mass spectrometry (FT ICR MS), can resolve thousands of molecular ions in complex organic matrices. A Compound Identification Algorithm (CIA) was previously developed for automated elemental formula assignment for natural organic matter (NOM). In this work, we describe software Formularity with a user-friendly interface for CIA function and newly developed search function Isotopic Pattern Algorithm (IPA). While CIA assigns elemental formulas for compounds containing C, H, O, N, S, and P, IPA is capable of assigning formulas for compounds containing other elements. We used halogenated organic compounds (HOC), a chemical class that is ubiquitous in nature as well as anthropogenic systems, as an example to demonstrate the capability of Formularity with IPA. A HOC standard mix was used to evaluate the identification confidence of IPA. Tap water and HOC spike in Suwannee River NOM were used to assess HOC identification in complex environmental samples. Strategies for reconciliation of CIA and IPA assignments were discussed. Software and sample databases with documentation are freely available.
LIQUID: an-open source software for identifying lipids in LC-MS/MS-based lipidomics data.
Kyle, Jennifer E; Crowell, Kevin L; Casey, Cameron P; Fujimoto, Grant M; Kim, Sangtae; Dautel, Sydney E; Smith, Richard D; Payne, Samuel H; Metz, Thomas O
2017-06-01
We introduce an open-source software, LIQUID, for semi-automated processing and visualization of LC-MS/MS-based lipidomics data. LIQUID provides users with the capability to process high throughput data and contains a customizable target library and scoring model per project needs. The graphical user interface provides visualization of multiple lines of spectral evidence for each lipid identification, allowing rapid examination of data for making confident identifications of lipid molecular species. LIQUID was compared to other freely available software commonly used to identify lipids and other small molecules (e.g. CFM-ID, MetFrag, GNPS, LipidBlast and MS-DIAL), and was found to have a faster processing time to arrive at a higher number of validated lipid identifications. LIQUID is available at http://github.com/PNNL-Comp-Mass-Spec/LIQUID . jennifer.kyle@pnnl.gov or thomas.metz@pnnl.gov. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Hiller, Karsten; Grote, Andreas; Maneck, Matthias; Münch, Richard; Jahn, Dieter
2006-10-01
After the publication of JVirGel 1.0 in 2003 we got many requests and suggestions from the proteomics community to further improve the performance of the software and to add additional useful new features. The integration of the PrediSi algorithm for the prediction of signal peptides for the Sec-dependent protein export into JVirGel 2.0 allows the exclusion of most exported preproteins from calculated proteomic maps and provides the basis for the calculation of Sec-based secretomes. A tool for the identification of transmembrane helices carrying proteins (JCaMelix) and the prediction of the corresponding membrane proteome was added. Finally, in order to directly compare experimental and calculated proteome data, a function to overlay and evaluate predicted and experimental two-dimensional gels was included. JVirGel 2.0 is freely available as precompiled package for the installation on Windows or Linux operating systems. Furthermore, there is a completely platform-independent Java version available for download. Additionally, we provide a Java Server Pages based version of JVirGel 2.0 which can be operated in nearly all web browsers. All versions are accessible at http://www.jvirgel.de
Stable isotope, site-specific mass tagging for protein identification
Chen, Xian
2006-10-24
Proteolytic peptide mass mapping as measured by mass spectrometry provides an important method for the identification of proteins, which are usually identified by matching the measured and calculated m/z values of the proteolytic peptides. A unique identification is, however, heavily dependent upon the mass accuracy and sequence coverage of the fragment ions generated by peptide ionization. The present invention describes a method for increasing the specificity, accuracy and efficiency of the assignments of particular proteolytic peptides and consequent protein identification, by the incorporation of selected amino acid residue(s) enriched with stable isotope(s) into the protein sequence without the need for ultrahigh instrumental accuracy. Selected amino acid(s) are labeled with .sup.13C/.sup.15N/.sup.2H and incorporated into proteins in a sequence-specific manner during cell culturing. Each of these labeled amino acids carries a defined mass change encoded in its monoisotopic distribution pattern. Through their characteristic patterns, the peptides with mass tag(s) can then be readily distinguished from other peptides in mass spectra. The present method of identifying unique proteins can also be extended to protein complexes and will significantly increase data search specificity, efficiency and accuracy for protein identifications.
RAPTR-SV: a hybrid method for the detection of structural variants
USDA-ARS?s Scientific Manuscript database
Motivation: Identification of Structural Variants (SV) in sequence data results in a large number of false positive calls using existing software, which overburdens subsequent validation. Results: Simulations using RAPTR-SV and another software package that uses a similar algorithm for SV detection...
Generic comparison of protein inference engines.
Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi
2012-04-01
Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.
CALCOM: a software for calculating the center of mass of proteins.
Costantini, Susan; Paladino, Antonella; Facchiano, Angelo M
2008-02-09
The center of mass of a protein is an artificial point useful for detecting important and simple features of proteins structure, shape and association.CALCOM is a software which calculates the center of mass of a protein, starting from PDB protein structure files. In the case of protein complexes and of protein-small ligand complexes, the position of protein residues or of ligand atoms respect to each protein subunit can be evaluated, as well as the distance among the center of mass of the protein subunits, in order to compare different conformations and evaluate the relative motion of subunits. THE SERVICE IS AVAILABLE AT THE URL: http://bioinformatica.isa.cnr.it/CALCOM/.
Ojima-Kato, Teruyo; Yamamoto, Naomi; Takahashi, Hajime; Tamura, Hiroto
2016-01-01
The genetic lineages of Listeria monocytogenes and other species of the genus Listeria are correlated with pathogenesis in humans. Although matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has become a prevailing tool for rapid and reliable microbial identification, the precise discrimination of Listeria species and lineages remains a crucial issue in clinical settings and for food safety. In this study, we constructed an accurate and reliable MS database to discriminate the lineages of L. monocytogenes and the species of Listeria (L. monocytogenes, L. innocua, L. welshimeri, L. seeligeri, L. ivanovii, L. grayi, and L. rocourtiae) based on the S10-spc-alpha operon gene encoded ribosomal protein mass spectrum (S10-GERMS) proteotyping method, which relies on both genetic information (genomics) and observed MS peaks in MALDI-TOF MS (proteomics). The specific set of eight biomarkers (ribosomal proteins L24, L6, L18, L15, S11, S9, L31 type B, and S16) yielded characteristic MS patterns for the lineages of L. monocytogenes and the different species of Listeria, and led to the construction of a MS database that was successful in discriminating between these organisms in MALDI-TOF MS fingerprinting analysis followed by advanced proteotyping software Strain Solution analysis. We also confirmed the constructed database on the proteotyping software Strain Solution by using 23 Listeria strains collected from natural sources.
Yamamoto, Naomi; Takahashi, Hajime; Tamura, Hiroto
2016-01-01
The genetic lineages of Listeria monocytogenes and other species of the genus Listeria are correlated with pathogenesis in humans. Although matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has become a prevailing tool for rapid and reliable microbial identification, the precise discrimination of Listeria species and lineages remains a crucial issue in clinical settings and for food safety. In this study, we constructed an accurate and reliable MS database to discriminate the lineages of L. monocytogenes and the species of Listeria (L. monocytogenes, L. innocua, L. welshimeri, L. seeligeri, L. ivanovii, L. grayi, and L. rocourtiae) based on the S10-spc-alpha operon gene encoded ribosomal protein mass spectrum (S10-GERMS) proteotyping method, which relies on both genetic information (genomics) and observed MS peaks in MALDI-TOF MS (proteomics). The specific set of eight biomarkers (ribosomal proteins L24, L6, L18, L15, S11, S9, L31 type B, and S16) yielded characteristic MS patterns for the lineages of L. monocytogenes and the different species of Listeria, and led to the construction of a MS database that was successful in discriminating between these organisms in MALDI-TOF MS fingerprinting analysis followed by advanced proteotyping software Strain Solution analysis. We also confirmed the constructed database on the proteotyping software Strain Solution by using 23 Listeria strains collected from natural sources. PMID:27442502
Evidence of Molecular Adaptation to Extreme Environments and Applicability to Space Environments
NASA Astrophysics Data System (ADS)
Filipovic, M. D.; Ognjanovic, S.; Ognjanovic, M.
2008-06-01
This is initial investigation of gene signatures responsible for adapting microscopic life to the extreme Earth environments. We present preliminary results on identification of the clusters of orthologous groups (COGs) common to several hyperthermophiles and exclusion of those common to a mesophile (non-hyperthermophile): Escherichia coli (E. coli K12), will yield a group of proteins possibly involved in adaptation to life under extreme temperatures. Comparative genome analyses represent a powerful tool in discovery of novel genes responsible for adaptation to specific extreme environments. Methanogens stand out as the only group of organisms that have species capable of growth at 0° C (Metarhizium frigidum (M.~frigidum) and Methanococcoides burtonii (M.~burtonii)) and 110° C (Methanopyrus kandleri (M.~kandleri)). Although not all the components of heat adaptation can be attributed to novel genes, the chaperones known as heat shock proteins stabilize the enzymes under elevated temperature. However, highly conserved chaperons found in bacteria and eukaryots are not present in hyperthermophilic Archea, rather, they have a unique chaperone TF55. Our aim was to use software which we specifically developed for extremophile genome comparative analyses in order to search for additional novel genes involved in hyperthermophile adaptation. The following hyperthermophile genomes incorporated in this software were used for these studies: Methanocaldococcus jannaschii (M.~jannaschii), M.~kandleri, Archaeoglobus fulgidus (A.~fulgidus) and three species of Pyrococcus. Common genes were annotated and grouped according to their roles in cellular processes where such information was available and proteins not previously implicated in the heat-adaptation of hyperthermophiles were identified. Additional experimental data are needed in order to learn more about these proteins. To address non-gene based components of thermal adaptation, all sequenced extremophiles were analysed for their GC contents and aminoacid hydrophobicity. Finally, we develop a prediction model for optimal growth temperature.
Greber, Boris; Siatkowski, Marcin; Paudel, Yogesh; Warsow, Gregor; Cap, Clemens; Schöler, Hans; Fuellen, Georg
2010-01-01
Background Analysis of the mechanisms underlying pluripotency and reprogramming would benefit substantially from easy access to an electronic network of genes, proteins and mechanisms. Moreover, interpreting gene expression data needs to move beyond just the identification of the up-/downregulation of key genes and of overrepresented processes and pathways, towards clarifying the essential effects of the experiment in molecular terms. Methodology/Principal Findings We have assembled a network of 574 molecular interactions, stimulations and inhibitions, based on a collection of research data from 177 publications until June 2010, involving 274 mouse genes/proteins, all in a standard electronic format, enabling analyses by readily available software such as Cytoscape and its plugins. The network includes the core circuit of Oct4 (Pou5f1), Sox2 and Nanog, its periphery (such as Stat3, Klf4, Esrrb, and c-Myc), connections to upstream signaling pathways (such as Activin, WNT, FGF, BMP, Insulin, Notch and LIF), and epigenetic regulators as well as some other relevant genes/proteins, such as proteins involved in nuclear import/export. We describe the general properties of the network, as well as a Gene Ontology analysis of the genes included. We use several expression data sets to condense the network to a set of network links that are affected in the course of an experiment, yielding hypotheses about the underlying mechanisms. Conclusions/Significance We have initiated an electronic data repository that will be useful to understand pluripotency and to facilitate the interpretation of high-throughput data. To keep up with the growth of knowledge on the fundamental processes of pluripotency and reprogramming, we suggest to combine Wiki and social networking software towards a community curation system that is easy to use and flexible, and tailored to provide a benefit for the scientist, and to improve communication and exchange of research results. A PluriNetWork tutorial is available at http://www.ibima.med.uni-rostock.de/IBIMA/PluriNetWork/. PMID:21179244
VirtualPlant: A Software Platform to Support Systems Biology Research1[W][OA
Katari, Manpreet S.; Nowicki, Steve D.; Aceituno, Felipe F.; Nero, Damion; Kelfer, Jonathan; Thompson, Lee Parnell; Cabello, Juan M.; Davidson, Rebecca S.; Goldberg, Arthur P.; Shasha, Dennis E.; Coruzzi, Gloria M.; Gutiérrez, Rodrigo A.
2010-01-01
Data generation is no longer the limiting factor in advancing biological research. In addition, data integration, analysis, and interpretation have become key bottlenecks and challenges that biologists conducting genomic research face daily. To enable biologists to derive testable hypotheses from the increasing amount of genomic data, we have developed the VirtualPlant software platform. VirtualPlant enables scientists to visualize, integrate, and analyze genomic data from a systems biology perspective. VirtualPlant integrates genome-wide data concerning the known and predicted relationships among genes, proteins, and molecules, as well as genome-scale experimental measurements. VirtualPlant also provides visualization techniques that render multivariate information in visual formats that facilitate the extraction of biological concepts. Importantly, VirtualPlant helps biologists who are not trained in computer science to mine lists of genes, microarray experiments, and gene networks to address questions in plant biology, such as: What are the molecular mechanisms by which internal or external perturbations affect processes controlling growth and development? We illustrate the use of VirtualPlant with three case studies, ranging from querying a gene of interest to the identification of gene networks and regulatory hubs that control seed development. Whereas the VirtualPlant software was developed to mine Arabidopsis (Arabidopsis thaliana) genomic data, its data structures, algorithms, and visualization tools are designed in a species-independent way. VirtualPlant is freely available at www.virtualplant.org. PMID:20007449
A bioinformatic survey of RNA-binding proteins in Plasmodium.
Reddy, B P Niranjan; Shrestha, Sony; Hart, Kevin J; Liang, Xiaoying; Kemirembe, Karen; Cui, Liwang; Lindner, Scott E
2015-11-02
The malaria parasites in the genus Plasmodium have a very complicated life cycle involving an invertebrate vector and a vertebrate host. RNA-binding proteins (RBPs) are critical factors involved in every aspect of the development of these parasites. However, very few RBPs have been functionally characterized to date in the human parasite Plasmodium falciparum. Using different bioinformatic methods and tools we searched P. falciparum genome to list and annotate RBPs. A representative 3D models for each of the RBD domain identified in P. falciparum was created using I-TESSAR and SWISS-MODEL. Microarray and RNAseq data analysis pertaining PfRBPs was performed using MeV software. Finally, Cytoscape was used to create protein-protein interaction network for CITH-Dozi and Caf1-CCR4-Not complexes. We report the identification of 189 putative RBP genes belonging to 13 different families in Plasmodium, which comprise 3.5% of all annotated genes. Almost 90% (169/189) of these genes belong to six prominent RBP classes, namely RNA recognition motifs, DEAD/H-box RNA helicases, K homology, Zinc finger, Puf and Alba gene families. Interestingly, almost all of the identified RNA-binding helicases and KH genes have cognate homologs in model species, suggesting their evolutionary conservation. Exploration of the existing P. falciparum blood-stage transcriptomes revealed that most RBPs have peak mRNA expression levels early during the intraerythrocytic development cycle, which taper off in later stages. Nearly 27% of RBPs have elevated expression in gametocytes, while 47 and 24% have elevated mRNA expression in ookinete and asexual stages. Comparative interactome analyses using human and Plasmodium protein-protein interaction datasets suggest extensive conservation of the PfCITH/PfDOZI and PfCaf1-CCR4-NOT complexes. The Plasmodium parasites possess a large number of putative RBPs belonging to most of RBP families identified so far, suggesting the presence of extensive post-transcriptional regulation in these parasites. Taken together, in silico identification of these putative RBPs provides a foundation for future functional studies aimed at defining a unique network of post-transcriptional regulation in P. falciparum.
Zhang, Zhongqi; Zhang, Aming; Xiao, Gang
2012-06-05
Protein hydrogen/deuterium exchange (HDX) followed by protease digestion and mass spectrometric (MS) analysis is accepted as a standard method for studying protein conformation and conformational dynamics. In this article, an improved HDX MS platform with fully automated data processing is described. The platform significantly reduces systematic and random errors in the measurement by introducing two types of corrections in HDX data analysis. First, a mixture of short peptides with fast HDX rates is introduced as internal standards to adjust the variations in the extent of back exchange from run to run. Second, a designed unique peptide (PPPI) with slow intrinsic HDX rate is employed as another internal standard to reflect the possible differences in protein intrinsic HDX rates when protein conformations at different solution conditions are compared. HDX data processing is achieved with a comprehensive HDX model to simulate the deuterium labeling and back exchange process. The HDX model is implemented into the in-house developed software MassAnalyzer and enables fully unattended analysis of the entire protein HDX MS data set starting from ion detection and peptide identification to final processed HDX output, typically within 1 day. The final output of the automated data processing is a set (or the average) of the most possible protection factors for each backbone amide hydrogen. The utility of the HDX MS platform is demonstrated by exploring the conformational transition of a monoclonal antibody by increasing concentrations of guanidine.
Otto-Karg, Ines; Jandl, Stefanie; Müller, Tobias; Stirzel, Beate; Frosch, Matthias; Hebestreit, Helge; Abele-Horn, Marianne
2009-01-01
Accurate identification and antimicrobial susceptibility testing (AST) of nonfermenters from cystic fibrosis patients are essential for appropriate antimicrobial treatment. This study examined the ability of the newly designed Vitek 2 nonfermenting gram-negative card (NGNC) (new gram-negative identification card; bioMérieux, Marcy-l'Ètoile, France) to identify nonfermenting gram-negative rods from cystic fibrosis patients in comparison to reference methods and the accuracy of the new Vitek 2 version 4.02 software for AST compared to the broth microdilution method. Two hundred twenty-four strains for identification and 138 strains for AST were investigated. The Vitek 2 NGNC identified 211 (94.1%) of the nonfermenters correctly. Among morphologically atypical microorganisms, five strains were misidentified and eight strains were determined with low discrimination, requiring additional tests which raised the correct identification rate to 97.8%. Regarding AST, the overall essential agreement of Vitek 2 was 97.6%, and the overall categorical agreement was 92.9%. Minor errors were found in 5.1% of strains, and major and very major errors were found in 1.6% and 0.3% of strains, respectively. In conclusion, the Vitek NGNC appears to be a reliable method for identification of morphologically typical nonfermenters and is an improvement over the API NE system and the Vitek 2 GNC database version 4.01. However, classification in morphologically atypical nonfermenters must be interpreted with care to avoid misidentification. Moreover, the new Vitek 2 version 4.02 software showed good results for AST and is suitable for routine clinical use. More work is needed for the reliable testing of strains whose MICs are close to the breakpoints. PMID:19710272
Ontology-based specification, identification and analysis of perioperative risks.
Uciteli, Alexandr; Neumann, Juliane; Tahar, Kais; Saleh, Kutaiba; Stucke, Stephan; Faulbrück-Röhr, Sebastian; Kaeding, André; Specht, Martin; Schmidt, Tobias; Neumuth, Thomas; Besting, Andreas; Stegemann, Dominik; Portheine, Frank; Herre, Heinrich
2017-09-06
Medical personnel in hospitals often works under great physical and mental strain. In medical decision-making, errors can never be completely ruled out. Several studies have shown that between 50 and 60% of adverse events could have been avoided through better organization, more attention or more effective security procedures. Critical situations especially arise during interdisciplinary collaboration and the use of complex medical technology, for example during surgical interventions and in perioperative settings (the period of time before, during and after surgical intervention). In this paper, we present an ontology and an ontology-based software system, which can identify risks across medical processes and supports the avoidance of errors in particular in the perioperative setting. We developed a practicable definition of the risk notion, which is easily understandable by the medical staff and is usable for the software tools. Based on this definition, we developed a Risk Identification Ontology (RIO) and used it for the specification and the identification of perioperative risks. An agent system was developed, which gathers risk-relevant data during the whole perioperative treatment process from various sources and provides it for risk identification and analysis in a centralized fashion. The results of such an analysis are provided to the medical personnel in form of context-sensitive hints and alerts. For the identification of the ontologically specified risks, we developed an ontology-based software module, called Ontology-based Risk Detector (OntoRiDe). About 20 risks relating to cochlear implantation (CI) have already been implemented. Comprehensive testing has indicated the correctness of the data acquisition, risk identification and analysis components, as well as the web-based visualization of results.
Using Malware Analysis to Tailor SQUARE for Mobile Platforms
2014-11-01
identification data (SIM card and International Mobile Station Equipment Identity Number [IMEI]) to duplicate the phone in another device so that it can...applications. Key logging software can be used to steal passwords for financial websites and credit card information [Sophos 2014]. Data theft...for consumption. Apple provides a limited set of APIs and provides the iTunes store as the only ave- nue to install new software. All software
For operation of the Computer Software Management and Information Center (COSMIC)
NASA Technical Reports Server (NTRS)
Carmon, J. L.
1983-01-01
During the month of June, the Survey Research Center (SRC) at the University of Georgia designed new benefits questionnaires for computer software management and information center (COSMIC). As a test of their utility, these questionnaires are now used in the benefits identification process.
Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc
2017-09-13
Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phisherman is an online software tool that was created to help experimenters study phishing. It can potentially be re-purposed to run other human studies. Phisherman enables studies to be run online, so that users can participate from their own computers. This means that experimenters can get data from subjects in their natural settings. Alternatively, an experimenter can also run the app online in a lab-based setting, if that is desired. The software enables the online deployment of a study that is comprised of three main parts: (1) a consent page, (2) a survey, and (3) an identification task, with instruction/transitionmore » screens between each part, allowing the experimenter to provide the user with instructions and messages. Upon logging in, the subject is taken to the permission page, where they agree to or do not agree to take part in the study. If the subject agrees to participate, then the software randomly chooses between doing the survey first (and identification task second) or the identification task first (and survey second). This is to balance possible order effects in the data. Procedurally, in the identification task, the software shows the stimuli to the subject, and asks if she thinks it is a phish (yes/no) and how confident she is about her answer. The subject is given 5 levels of certainty to select from, labeled "low" (1), to "medium" (3), to "high" (5), with the option of picking a level between low and medium (2), and between medium and high (4). After selecting his/her confidence level, then the "Next" button activates, allowing a user to move to the next email. The software saves a given subject's progress in the identification task, so that she may log in and out of the site. The consent page is a space for the experimenter to provide the subject with human studies board /internal review board information, and to formally consent to participate in the study. The survey is a space for the experimenter to provide questions and spaces for the users to input answers (allowing both multiple-choice and free-answer options). Phisherman includes administrative pages for managing the stimuli and users. This includes a tool for the experimenter to create, preview, edit, delete (if desired), and manage stimuli (emails). The stimuli may include pictures (uploaded to an appropriate folder) and links, for realism. The software includes a safety feature that prevents the user from going to any link location or opening a file/image. Instead of re-directing the subject's browser, the software provides a pop-up box with the URL location of where the user would have gone. Another administrative page may be used to create fake subject accounts for testing the software prior to deployment, as well as to delete subject accounts when necessary. Data from the experiment can be downloaded from another administrative page.« less
Santos, Hugo M; Reboiro-Jato, Miguel; Glez-Peña, Daniel; Nunes-Miranda, J D; Fdez-Riverola, Florentino; Carvallo, R; Capelo, J L
2010-09-15
The decision peptide-driven tool implements a software application for assisting the user in a protocol for accurate protein quantification based on the following steps: (1) protein separation through gel electrophoresis; (2) in-gel protein digestion; (3) direct and inverse (18)O-labeling and (4) matrix assisted laser desorption ionization time of flight mass spectrometry, MALDI analysis. The DPD software compares the MALDI results of the direct and inverse (18)O-labeling experiments and quickly identifies those peptides with paralleled loses in different sets of a typical proteomic workflow. Those peptides are used for subsequent accurate protein quantification. The interpretation of the MALDI data from direct and inverse labeling experiments is time-consuming requiring a significant amount of time to do all comparisons manually. The DPD software shortens and simplifies the searching of the peptides that must be used for quantification from a week to just some minutes. To do so, it takes as input several MALDI spectra and aids the researcher in an automatic mode (i) to compare data from direct and inverse (18)O-labeling experiments, calculating the corresponding ratios to determine those peptides with paralleled losses throughout different sets of experiments; and (ii) allow to use those peptides as internal standards for subsequent accurate protein quantification using (18)O-labeling. In this work the DPD software is presented and explained with the quantification of protein carbonic anhydrase. Copyright (c) 2010 Elsevier B.V. All rights reserved.
System IDentification Programs for AirCraft (SIDPAC)
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.
2002-01-01
A collection of computer programs for aircraft system identification is described and demonstrated. The programs, collectively called System IDentification Programs for AirCraft, or SIDPAC, were developed in MATLAB as m-file functions. SIDPAC has been used successfully at NASA Langley Research Center with data from many different flight test programs and wind tunnel experiments. SIDPAC includes routines for experiment design, data conditioning, data compatibility analysis, model structure determination, equation-error and output-error parameter estimation in both the time and frequency domains, real-time and recursive parameter estimation, low order equivalent system identification, estimated parameter error calculation, linear and nonlinear simulation, plotting, and 3-D visualization. An overview of SIDPAC capabilities is provided, along with a demonstration of the use of SIDPAC with real flight test data from the NASA Glenn Twin Otter aircraft. The SIDPAC software is available without charge to U.S. citizens by request to the author, contingent on the requestor completing a NASA software usage agreement.
Derrick, Sharon M; Raxter, Michelle H; Hipp, John A; Goel, Priya; Chan, Elaine F; Love, Jennifer C; Wiersema, Jason M; Akella, N Shastry
2015-01-01
Medical examiners and coroners (ME/C) in the United States hold statutory responsibility to identify deceased individuals who fall under their jurisdiction. The computer-assisted decedent identification (CADI) project was designed to modify software used in diagnosis and treatment of spinal injuries into a mathematically validated tool for ME/C identification of fleshed decedents. CADI software analyzes the shapes of targeted vertebral bodies imaged in an array of standard radiographs and quantifies the likelihood that any two of the radiographs contain matching vertebral bodies. Six validation tests measured the repeatability, reliability, and sensitivity of the method, and the effects of age, sex, and number of radiographs in array composition. CADI returned a 92-100% success rate in identifying the true matching pair of vertebrae within arrays of five to 30 radiographs. Further development of CADI is expected to produce a novel identification method for use in ME/C offices that is reliable, timely, and cost-effective. © 2014 American Academy of Forensic Sciences.
USDA-ARS?s Scientific Manuscript database
Immunogenic, pathogen-specific proteins have excellent potential for development of novel management modalities. Here, we describe an innovative application of proteomics called Microbial protein-Antigenome Determination (MAD) Technology for rapid identification of native microbial proteins that el...
USDA-ARS?s Scientific Manuscript database
Immunogenic, pathogen-specific proteins have excellent potential for development of novel management modalities. Here, we describe an innovative application of proteomics called Microbial protein-Antigenome Determination (MAD) Technology for rapid identification of native microbial proteins that eli...
Titulaer, Mark K; Siccama, Ivar; Dekker, Lennard J; van Rijswijk, Angelique LCT; Heeren, Ron MA; Sillevis Smitt, Peter A; Luider, Theo M
2006-01-01
Background Statistical comparison of peptide profiles in biomarker discovery requires fast, user-friendly software for high throughput data analysis. Important features are flexibility in changing input variables and statistical analysis of peptides that are differentially expressed between patient and control groups. In addition, integration the mass spectrometry data with the results of other experiments, such as microarray analysis, and information from other databases requires a central storage of the profile matrix, where protein id's can be added to peptide masses of interest. Results A new database application is presented, to detect and identify significantly differentially expressed peptides in peptide profiles obtained from body fluids of patient and control groups. The presented modular software is capable of central storage of mass spectra and results in fast analysis. The software architecture consists of 4 pillars, 1) a Graphical User Interface written in Java, 2) a MySQL database, which contains all metadata, such as experiment numbers and sample codes, 3) a FTP (File Transport Protocol) server to store all raw mass spectrometry files and processed data, and 4) the software package R, which is used for modular statistical calculations, such as the Wilcoxon-Mann-Whitney rank sum test. Statistic analysis by the Wilcoxon-Mann-Whitney test in R demonstrates that peptide-profiles of two patient groups 1) breast cancer patients with leptomeningeal metastases and 2) prostate cancer patients in end stage disease can be distinguished from those of control groups. Conclusion The database application is capable to distinguish patient Matrix Assisted Laser Desorption Ionization (MALDI-TOF) peptide profiles from control groups using large size datasets. The modular architecture of the application makes it possible to adapt the application to handle also large sized data from MS/MS- and Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometry experiments. It is expected that the higher resolution and mass accuracy of the FT-ICR mass spectrometry prevents the clustering of peaks of different peptides and allows the identification of differentially expressed proteins from the peptide profiles. PMID:16953879
Titulaer, Mark K; Siccama, Ivar; Dekker, Lennard J; van Rijswijk, Angelique L C T; Heeren, Ron M A; Sillevis Smitt, Peter A; Luider, Theo M
2006-09-05
Statistical comparison of peptide profiles in biomarker discovery requires fast, user-friendly software for high throughput data analysis. Important features are flexibility in changing input variables and statistical analysis of peptides that are differentially expressed between patient and control groups. In addition, integration the mass spectrometry data with the results of other experiments, such as microarray analysis, and information from other databases requires a central storage of the profile matrix, where protein id's can be added to peptide masses of interest. A new database application is presented, to detect and identify significantly differentially expressed peptides in peptide profiles obtained from body fluids of patient and control groups. The presented modular software is capable of central storage of mass spectra and results in fast analysis. The software architecture consists of 4 pillars, 1) a Graphical User Interface written in Java, 2) a MySQL database, which contains all metadata, such as experiment numbers and sample codes, 3) a FTP (File Transport Protocol) server to store all raw mass spectrometry files and processed data, and 4) the software package R, which is used for modular statistical calculations, such as the Wilcoxon-Mann-Whitney rank sum test. Statistic analysis by the Wilcoxon-Mann-Whitney test in R demonstrates that peptide-profiles of two patient groups 1) breast cancer patients with leptomeningeal metastases and 2) prostate cancer patients in end stage disease can be distinguished from those of control groups. The database application is capable to distinguish patient Matrix Assisted Laser Desorption Ionization (MALDI-TOF) peptide profiles from control groups using large size datasets. The modular architecture of the application makes it possible to adapt the application to handle also large sized data from MS/MS- and Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometry experiments. It is expected that the higher resolution and mass accuracy of the FT-ICR mass spectrometry prevents the clustering of peaks of different peptides and allows the identification of differentially expressed proteins from the peptide profiles.
HIGH-THROUGHPUT IDENTIFICATION OF CATALYTIC REDOX-ACTIVE CYSTEINE RESIDUES
Cysteine (Cys) residues often play critical roles in proteins; however, identification of their specific functions has been limited to case-by-case experimental approaches. We developed a procedure for high-throughput identification of catalytic redox-active Cys in proteins by se...
48 CFR 252.227-7017 - Identification and assertion of use, release, or disclosure restrictions.
Code of Federal Regulations, 2011 CFR
2011-10-01
... and Computer Software—Small Business Innovation Research (SBIR) Program clause. (2) If a successful offeror will not be required to deliver technical data, the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation clause, or, if this solicitation contemplates a contract under the...
48 CFR 252.227-7017 - Identification and assertion of use, release, or disclosure restrictions.
Code of Federal Regulations, 2014 CFR
2014-10-01
... and Computer Software—Small Business Innovation Research (SBIR) Program clause. (2) If a successful offeror will not be required to deliver technical data, the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation clause, or, if this solicitation contemplates a contract under the...
48 CFR 252.227-7017 - Identification and assertion of use, release, or disclosure restrictions.
Code of Federal Regulations, 2010 CFR
2010-10-01
... and Computer Software—Small Business Innovative Research (SBIR) Program clause. (2) If a successful offeror will not be required to deliver technical data, the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation clause, or, if this solicitation contemplates a contract under the...
48 CFR 252.227-7017 - Identification and assertion of use, release, or disclosure restrictions.
Code of Federal Regulations, 2013 CFR
2013-10-01
... and Computer Software—Small Business Innovation Research (SBIR) Program clause. (2) If a successful offeror will not be required to deliver technical data, the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation clause, or, if this solicitation contemplates a contract under the...
48 CFR 252.227-7017 - Identification and assertion of use, release, or disclosure restrictions.
Code of Federal Regulations, 2012 CFR
2012-10-01
... and Computer Software—Small Business Innovation Research (SBIR) Program clause. (2) If a successful offeror will not be required to deliver technical data, the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation clause, or, if this solicitation contemplates a contract under the...
Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis
2015-01-01
Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data. PMID:25182276
Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas
2016-08-10
To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreyer, Jonathan G.; Wang, Tzu-Fang; Vo, Duc T.
Under a 2006 agreement between the Department of Energy (DOE) of the United States of America and the Institut de Radioprotection et de Sûreté Nucléaire (IRSN) of France, the National Nuclear Security Administration (NNSA) within DOE and IRSN initiated a collaboration to improve isotopic identification and analysis of nuclear material [i.e., plutonium (Pu) and uranium (U)]. The specific aim of the collaborative project was to develop new versions of two types of isotopic identification and analysis software: (1) the fixed-energy response-function analysis for multiple energies (FRAM) codes and (2) multi-group analysis (MGA) codes. The project is entitled Action Sheet 4more » – Cooperation on Improved Isotopic Identification and Analysis Software for Portable, Electrically Cooled, High-Resolution Gamma Spectrometry Systems (Action Sheet 4). FRAM and MGA/U235HI are software codes used to analyze isotopic ratios of U and Pu. FRAM is an application that uses parameter sets for the analysis of U or Pu. MGA and U235HI are two separate applications that analyze Pu or U, respectively. They have traditionally been used by safeguards practitioners to analyze gamma spectra acquired with high-resolution gamma spectrometry (HRGS) systems that are cooled by liquid nitrogen. However, it was discovered that these analysis programs were not as accurate when used on spectra acquired with a newer generation of more portable, electrically cooled HRGS (ECHRGS) systems. In response to this need, DOE/NNSA and IRSN collaborated to update the FRAM and U235HI codes to improve their performance with newer ECHRGS systems. Lawrence Livermore National Laboratory (LLNL) and Los Alamos National Laboratory (LANL) performed this work for DOE/NNSA.« less
AU-FREDI - AUTONOMOUS FREQUENCY DOMAIN IDENTIFICATION
NASA Technical Reports Server (NTRS)
Yam, Y.
1994-01-01
The Autonomous Frequency Domain Identification program, AU-FREDI, is a system of methods, algorithms and software that was developed for the identification of structural dynamic parameters and system transfer function characterization for control of large space platforms and flexible spacecraft. It was validated in the CALTECH/Jet Propulsion Laboratory's Large Spacecraft Control Laboratory. Due to the unique characteristics of this laboratory environment, and the environment-specific nature of many of the software's routines, AU-FREDI should be considered to be a collection of routines which can be modified and reassembled to suit system identification and control experiments on large flexible structures. The AU-FREDI software was originally designed to command plant excitation and handle subsequent input/output data transfer, and to conduct system identification based on the I/O data. Key features of the AU-FREDI methodology are as follows: 1. AU-FREDI has on-line digital filter design to support on-orbit optimal input design and data composition. 2. Data composition of experimental data in overlapping frequency bands overcomes finite actuator power constraints. 3. Recursive least squares sine-dwell estimation accurately handles digitized sinusoids and low frequency modes. 4. The system also includes automated estimation of model order using a product moment matrix. 5. A sample-data transfer function parametrization supports digital control design. 6. Minimum variance estimation is assured with a curve fitting algorithm with iterative reweighting. 7. Robust root solvers accurately factorize high order polynomials to determine frequency and damping estimates. 8. Output error characterization of model additive uncertainty supports robustness analysis. The research objectives associated with AU-FREDI were particularly useful in focusing the identification methodology for realistic on-orbit testing conditions. Rather than estimating the entire structure, as is typically done in ground structural testing, AU-FREDI identifies only the key transfer function parameters and uncertainty bounds that are necessary for on-line design and tuning of robust controllers. AU-FREDI's system identification algorithms are independent of the JPL-LSCL environment, and can easily be extracted and modified for use with input/output data files. The basic approach of AU-FREDI's system identification algorithms is to non-parametrically identify the sampled data in the frequency domain using either stochastic or sine-dwell input, and then to obtain a parametric model of the transfer function by curve-fitting techniques. A cross-spectral analysis of the output error is used to determine the additive uncertainty in the estimated transfer function. The nominal transfer function estimate and the estimate of the associated additive uncertainty can be used for robust control analysis and design. AU-FREDI's I/O data transfer routines are tailored to the environment of the CALTECH/ JPL-LSCL which included a special operating system to interface with the testbed. Input commands for a particular experiment (wideband, narrowband, or sine-dwell) were computed on-line and then issued to respective actuators by the operating system. The operating system also took measurements through displacement sensors and passed them back to the software for storage and off-line processing. In order to make use of AU-FREDI's I/O data transfer routines, a user would need to provide an operating system capable of overseeing such functions between the software and the experimental setup at hand. The program documentation contains information designed to support users in either providing such an operating system or modifying the system identification algorithms for use with input/output data files. It provides a history of the theoretical, algorithmic and software development efforts including operating system requirements and listings of some of the various special purpose subroutines which were developed and optimized for Lahey FORTRAN compilers on IBM PC-AT computers before the subroutines were integrated into the system software. Potential purchasers are encouraged to purchase and review the documentation before purchasing the AU-FREDI software. AU-FREDI is distributed in DEC VAX BACKUP format on a 1600 BPI 9-track magnetic tape (standard media) or a TK50 tape cartridge. AU-FREDI was developed in 1989 and is a copyrighted work with all copyright vested in NASA.
2010-01-01
Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that the SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together on the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. PMID:20359330
Toward improved peptide feature detection in quantitative proteomics using stable isotope labeling.
Nilse, Lars; Sigloch, Florian Christoph; Biniossek, Martin L; Schilling, Oliver
2015-08-01
Reliable detection of peptides in LC-MS data is a key algorithmic step in the analysis of quantitative proteomics experiments. While highly abundant peptides can be detected reliably by most modern software tools, there is much less agreement on medium and low-intensity peptides in a sample. The choice of software tools can have a big impact on the quantification of proteins, especially for proteins that appear in lower concentrations. However, in many experiments, it is precisely this region of less abundant but substantially regulated proteins that holds the biggest potential for discoveries. This is particularly true for discovery proteomics in the pharmacological sector with a specific interest in key regulatory proteins. In this viewpoint article, we discuss how the development of novel software algorithms allows us to study this region of the proteome with increased confidence. Reliable results are one of many aspects to be considered when deciding on a bioinformatics software platform. Deployment into existing IT infrastructures, compatibility with other software packages, scalability, automation, flexibility, and support need to be considered and are briefly addressed in this viewpoint article. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology
Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton
2013-01-01
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552
Serafim, Vlad; Shah, Ajit; Puiu, Maria; Andreescu, Nicoleta; Coricovac, Dorina; Nosyrev, Alexander; Spandidos, Demetrios A; Tsatsakis, Aristides M; Dehelean, Cristina; Pinzaru, Iulia
2017-10-01
Over the past decade, matrix-assisted laser desorption/ionization time‑of‑flight mass spectrometry (MALDI‑TOF MS) has been established as a valuable platform for microbial identification, and it is also frequently applied in biology and clinical studies to identify new markers expressed in pathological conditions. The aim of the present study was to assess the potential of using this approach for the classification of cancer cell lines as a quantifiable method for the proteomic profiling of cellular organelles. Intact protein extracts isolated from different tumor cell lines (human and murine) were analyzed using MALDI‑TOF MS and the obtained mass lists were processed using principle component analysis (PCA) within Bruker Biotyper® software. Furthermore, reference spectra were created for each cell line and were used for classification. Based on the intact protein profiles, we were able to differentiate and classify six cancer cell lines: two murine melanoma (B16‑F0 and B164A5), one human melanoma (A375), two human breast carcinoma (MCF7 and MDA‑MB‑231) and one human liver carcinoma (HepG2). The cell lines were classified according to cancer type and the species they originated from, as well as by their metastatic potential, offering the possibility to differentiate non‑invasive from invasive cells. The obtained results pave the way for developing a broad‑based strategy for the identification and classification of cancer cells.
Dormeyer, Wilma; van Hoof, Dennis; Mummery, Christine L; Krijgsveld, Jeroen; Heck, Albert J R
2008-10-01
The identification of (plasma) membrane proteins in cells can provide valuable insights into the regulation of their biological processes. Pluripotent cells such as human embryonic stem cells and embryonal carcinoma cells are capable of unlimited self-renewal and share many of the biological mechanisms that regulate proliferation and differentiation. The comparison of their membrane proteomes will help unravel the biological principles of pluripotency, and the identification of biomarker proteins in their plasma membranes is considered a crucial step to fully exploit pluripotent cells for therapeutic purposes. For these tasks, membrane proteomics is the method of choice, but as indicated by the scarce identification of membrane and plasma membrane proteins in global proteomic surveys it is not an easy task. In this minireview, we first describe the general challenges of membrane proteomics. We then review current sample preparation steps and discuss protocols that we found particularly beneficial for the identification of large numbers of (plasma) membrane proteins in human tumour- and embryo-derived stem cells. Our optimized assembled protocol led to the identification of a large number of membrane proteins. However, as the composition of cells and membranes is highly variable we still recommend adapting the sample preparation protocol for each individual system.
Spatial Identification of Passive Radio Frequency Identification Tags Using Software Defined Radios
2012-03-01
75 3.4 Experiment Configurations . . . . . . . . . . . . . . . . . . . . 77 4.1 Simulation Enviromental Elements . . . . . . . . . . . . . . . . 79...tabletop zReader 20cm Tag vertical offset from reader z 10 cm 3dB angle of sensor antenna theat3db 0.698 radians Table 4.1: Simulation Enviromental
49 CFR Appendix D to Part 236 - Independent Review of Verification and Validation
Code of Federal Regulations, 2010 CFR
2010-10-01
... standards. (f) The reviewer shall analyze all Fault Tree Analyses (FTA), Failure Mode and Effects... for each product vulnerability cited by the reviewer; (4) Identification of any documentation or... not properly followed; (6) Identification of the software verification and validation procedures, as...
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
NASA Astrophysics Data System (ADS)
Nicolae, Marius; Rajasekaran, Sanguthevar
2015-01-01
Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.
Fagerquist, Clifton K
2017-01-01
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) is increasingly utilized as a rapid technique to identify microorganisms including pathogenic bacteria. However, little attention has been paid to the significant proteomic information encoded in the MS peaks that collectively constitute the MS 'fingerprint'. This review/perspective is intended to explore this topic in greater detail in the hopes that it may spur interest and further research in this area. Areas covered: This paper examines the recent literature on utilizing MALDI-TOF for bacterial identification. Critical works highlighting protein biomarker identification of bacteria, arguments for and against protein biomarker identification, proteomic approaches to biomarker identification, emergence of MALDI-TOF-TOF platforms and their use for top-down proteomic identification of bacterial proteins, protein denaturation and its effect on protein ion fragmentation, collision cross-sections and energy deposition during desorption/ionization are also explored. Expert commentary: MALDI-TOF and TOF-TOF mass spectrometry platforms will continue to provide chemical analyses that are rapid, cost-effective and high throughput. These instruments have proven their utility in the taxonomic identification of pathogenic bacteria at the genus and species level and are poised to more fully characterize these microorganisms to the benefit of clinical microbiology, food safety and other fields.
Characterization of a Protein Interactome by Co-Immunoprecipitation and Shotgun Mass Spectrometry.
Maccarrone, Giuseppina; Bonfiglio, Juan Jose; Silberstein, Susana; Turck, Christoph W; Martins-de-Souza, Daniel
2017-01-01
Identifying the partners of a given protein (the interactome) may provide leads about the protein's function and the molecular mechanisms in which it is involved. One of the alternative strategies used to characterize protein interactomes consists of co-immunoprecipitation (co-IP) followed by shotgun mass spectrometry. This enables the isolation and identification of a protein target in its native state and its interactome from cells or tissue lysates under physiological conditions. In this chapter, we describe a co-IP protocol for interactome studies that uses an antibody against a protein of interest bound to protein A/G plus agarose beads to isolate a protein complex. The interacting proteins may be further fractionated by SDS-PAGE, followed by in-gel tryptic digestion and nano liquid chromatography high-resolution tandem mass spectrometry (nLC ESI-MS/MS) for identification purposes. The computational tools, strategy for protein identification, and use of interactome databases also will be described.
Albaum, Stefan P; Neuweger, Heiko; Fränzel, Benjamin; Lange, Sita; Mertens, Dominik; Trötschel, Christian; Wolters, Dirk; Kalinowski, Jörn; Nattkemper, Tim W; Goesmann, Alexander
2009-12-01
The goal of present -omics sciences is to understand biological systems as a whole in terms of interactions of the individual cellular components. One of the main building blocks in this field of study is proteomics where tandem mass spectrometry (LC-MS/MS) in combination with isotopic labelling techniques provides a common way to obtain a direct insight into regulation at the protein level. Methods to identify and quantify the peptides contained in a sample are well established, and their output usually results in lists of identified proteins and calculated relative abundance values. The next step is to move ahead from these abstract lists and apply statistical inference methods to compare measurements, to identify genes that are significantly up- or down-regulated, or to detect clusters of proteins with similar expression profiles. We introduce the Rich Internet Application (RIA) Qupe providing comprehensive data management and analysis functions for LC-MS/MS experiments. Starting with the import of mass spectra data the system guides the experimenter through the process of protein identification by database search, the calculation of protein abundance ratios, and in particular, the statistical evaluation of the quantification results including multivariate analysis methods such as analysis of variance or hierarchical cluster analysis. While a data model to store these results has been developed, a well-defined programming interface facilitates the integration of novel approaches. A compute cluster is utilized to distribute computationally intensive calculations, and a web service allows to interchange information with other -omics software applications. To demonstrate that Qupe represents a step forward in quantitative proteomics analysis an application study on Corynebacterium glutamicum has been carried out. Qupe is implemented in Java utilizing Hibernate, Echo2, R and the Spring framework. We encourage the usage of the RIA in the sense of the 'software as a service' concept, maintained on our servers and accessible at the following location: http://qupe.cebitec.uni-bielefeld.de. Supplementary data are available at Bioinformatics online.
Tani, Akio; Sahin, Nurettin; Matsuyama, Yumiko; Enomoto, Takashi; Nishimura, Naoki; Yokota, Akira; Kimbara, Kazuhide
2012-01-01
Methylobacterium species are ubiquitous α-proteobacteria that reside in the phyllosphere and are fed by methanol that is emitted from plants. In this study, we applied whole-cell matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis (WC-MS) to evaluate the diversity of Methylobacterium species collected from a variety of plants. The WC-MS spectrum was reproducible through two weeks of cultivation on different media. WC-MS spectrum peaks of M. extorquens strain AM1 cells were attributed to ribosomal proteins, but those were not were also found. We developed a simple method for rapid identification based on spectra similarity. Using all available type strains of Methylobacterium species, the method provided a certain threshold similarity value for species-level discrimination, although the genus contains some type strains that could not be easily discriminated solely by 16S rRNA gene sequence similarity. Next, we evaluated the WC-MS data of approximately 200 methylotrophs isolated from various plants with MALDI Biotyper software (Bruker Daltonics). Isolates representing each cluster were further identified by 16S rRNA gene sequencing. In most cases, the identification by WC-MS matched that by sequencing, and isolates with unique spectra represented possible novel species. The strains belonging to M. extorquens, M. adhaesivum, M. marchantiae, M. komagatae, M. brachiatum, M. radiotolerans, and novel lineages close to M. adhaesivum, many of which were isolated from bryophytes, were found to be the most frequent phyllospheric colonizers. The WC-MS technique provides emerging high-throughputness in the identification of known/novel species of bacteria, enabling the selection of novel species in a library and identification without 16S rRNA gene sequencing. PMID:22808262
Software support environment design knowledge capture
NASA Technical Reports Server (NTRS)
Dollman, Tom
1990-01-01
The objective of this task is to assess the potential for using the software support environment (SSE) workstations and associated software for design knowledge capture (DKC) tasks. This assessment will include the identification of required capabilities for DKC and hardware/software modifications needed to support DKC. Several approaches to achieving this objective are discussed and interim results are provided: (1) research into the problem of knowledge engineering in a traditional computer-aided software engineering (CASE) environment, like the SSE; (2) research into the problem of applying SSE CASE tools to develop knowledge based systems; and (3) direct utilization of SSE workstations to support a DKC activity.
GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites
Gao, Xinjiao; Ma, Qian; Ren, Jian; Xue, Yu
2011-01-01
As one of the most essential post-translational modifications (PTMs) of proteins, proteolysis, especially calpain-mediated cleavage, plays an important role in many biological processes, including cell death/apoptosis, cytoskeletal remodeling, and the cell cycle. Experimental identification of calpain targets with bona fide cleavage sites is fundamental for dissecting the molecular mechanisms and biological roles of calpain cleavage. In contrast to time-consuming and labor-intensive experimental approaches, computational prediction of calpain cleavage sites might more cheaply and readily provide useful information for further experimental investigation. In this work, we constructed a novel software package of GPS-CCD (Calpain Cleavage Detector) for the prediction of calpain cleavage sites, with an accuracy of 89.98%, sensitivity of 60.87% and specificity of 90.07%. With this software, we annotated potential calpain cleavage sites for hundreds of calpain substrates, for which the exact cleavage sites had not been previously determined. In this regard, GPS-CCD 1.0 is considered to be a useful tool for experimentalists. The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely available at: http://ccd.biocuckoo.org/. PMID:21533053
The PROTICdb database for 2-DE proteomics.
Langella, Olivier; Zivy, Michel; Joets, Johann
2007-01-01
PROTICdb is a web-based database mainly designed to store and analyze plant proteome data obtained by 2D polyacrylamide gel electrophoresis (2D PAGE) and mass spectrometry (MS). The goals of PROTICdb are (1) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements; and (2) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of posttranslational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs from Mélanie, PDQuest, IM2d, ImageMaster(tm) 2D Platinum v5.0, Progenesis, Sequest, MS-Fit, and Mascot software, or by filling in web forms (experimental design and methods). 2D PAGE-annotated maps can be displayed, queried, and compared through the GelBrowser. Quantitative data can be easily exported in a tabulated format for statistical analyses with any third-party software. PROTICdb is based on the Oracle or the PostgreSQLDataBase Management System (DBMS) and is freely available upon request at http://cms.moulon.inra.fr/content/view/14/44/.
Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.
Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan
2016-01-01
Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.
Behavioral biometrics for verification and recognition of malicious software agents
NASA Astrophysics Data System (ADS)
Yampolskiy, Roman V.; Govindaraju, Venu
2008-04-01
Homeland security requires technologies capable of positive and reliable identification of humans for law enforcement, government, and commercial applications. As artificially intelligent agents improve in their abilities and become a part of our everyday life, the possibility of using such programs for undermining homeland security increases. Virtual assistants, shopping bots, and game playing programs are used daily by millions of people. We propose applying statistical behavior modeling techniques developed by us for recognition of humans to the identification and verification of intelligent and potentially malicious software agents. Our experimental results demonstrate feasibility of such methods for both artificial agent verification and even for recognition purposes.
Pattern identification in time-course gene expression data with the CoGAPS matrix factorization.
Fertig, Elana J; Stein-O'Brien, Genevieve; Jaffe, Andrew; Colantuoni, Carlo
2014-01-01
Patterns in time-course gene expression data can represent the biological processes that are active over the measured time period. However, the orthogonality constraint in standard pattern-finding algorithms, including notably principal components analysis (PCA), confounds expression changes resulting from simultaneous, non-orthogonal biological processes. Previously, we have shown that Markov chain Monte Carlo nonnegative matrix factorization algorithms are particularly adept at distinguishing such concurrent patterns. One such matrix factorization is implemented in the software package CoGAPS. We describe the application of this software and several technical considerations for identification of age-related patterns in a public, prefrontal cortex gene expression dataset.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fotion, Katherine A.
2016-08-18
The Radionuclide Analysis Kit (RNAK), my team’s most recent nuclide identification software, is entering the testing phase. A question arises: will removing rare nuclides from the software’s library improve its overall performance? An affirmative response indicates fundamental errors in the software’s framework, while a negative response confirms the effectiveness of the software’s key machine learning algorithms. After thorough testing, I found that the performance of RNAK cannot be improved with the library choice effect, thus verifying the effectiveness of RNAK’s algorithms—multiple linear regression, Bayesian network using the Viterbi algorithm, and branch and bound search.
NASA Tech Briefs, January 2004
NASA Technical Reports Server (NTRS)
2004-01-01
Topics covered include: Multisensor Instrument for Real-Time Biological Monitoring; Sensor for Monitoring Nanodevice-Fabrication Plasmas; Backed Bending Actuator; Compact Optoelectronic Compass; Micro Sun Sensor for Spacecraft; Passive IFF: Autonomous Nonintrusive Rapid Identification of Friendly Assets; Finned-Ladder Slow-Wave Circuit for a TWT; Directional Radio-Frequency Identification Tag Reader; Integrated Solar-Energy-Harvesting and -Storage Device; Event-Driven Random-Access-Windowing CCD Imaging System; Stroboscope Controller for Imaging Helicopter Rotors; Software for Checking State-charts; Program Predicts Broadband Noise from a Turbofan Engine; Protocol for a Delay-Tolerant Data-Communication Network; Software Implements a Space-Mission File-Transfer Protocol; Making Carbon-Nanotube Arrays Using Block Copolymers: Part 2; Modular Rake of Pitot Probes; Preloading To Accelerate Slow-Crack-Growth Testing; Miniature Blimps for Surveillance and Collection of Samples; Hybrid Automotive Engine Using Ethanol-Burning Miller Cycle; Fabricating Blazed Diffraction Gratings by X-Ray Lithography; Freeze-Tolerant Condensers; The StarLight Space Interferometer; Champagne Heat Pump; Controllable Sonar Lenses and Prisms Based on ERFs; Measuring Gravitation Using Polarization Spectroscopy; Serial-Turbo-Trellis-Coded Modulation with Rate-1 Inner Code; Enhanced Software for Scheduling Space-Shuttle Processing; Bayesian-Augmented Identification of Stars in a Narrow View; Spacecraft Orbits for Earth/Mars-Lander Radio Relay; and Self-Inflatable/Self-Rigidizable Reflectarray Antenna.
PDBStat: a universal restraint converter and restraint analysis software package for protein NMR.
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M; Montelione, Gaetano T
2013-08-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M.; Montelione, Gaetano T
2013-01-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data. PMID:23897031
NASA Technical Reports Server (NTRS)
Bergmann, E.
1976-01-01
The current baseline method and software implementation of the space shuttle reaction control subsystem failure detection and identification (RCS FDI) system is presented. This algorithm is recommended for conclusion in the redundancy management (RM) module of the space shuttle guidance, navigation, and control system. Supporting software is presented, and recommended for inclusion in the system management (SM) and display and control (D&C) systems. RCS FDI uses data from sensors in the jets, in the manifold isolation valves, and in the RCS fuel and oxidizer storage tanks. A list of jet failures and fuel imbalance warnings is generated for use by the jet selection algorithm of the on-orbit and entry flight control systems, and to inform the crew and ground controllers of RCS failure status. Manifold isolation valve close commands are generated in the event of failed on or leaking jets to prevent loss of large quantities of RCS fuel.
On the Reproducibility of Label-Free Quantitative Cross-Linking/Mass Spectrometry
NASA Astrophysics Data System (ADS)
Müller, Fränze; Fischer, Lutz; Chen, Zhuo Angel; Auchynnikava, Tania; Rappsilber, Juri
2018-02-01
Quantitative cross-linking/mass spectrometry (QCLMS) is an emerging approach to study conformational changes of proteins and multi-subunit complexes. Distinguishing protein conformations requires reproducibly identifying and quantifying cross-linked peptides. Here we analyzed the variation between multiple cross-linking reactions using bis[sulfosuccinimidyl] suberate (BS3)-cross-linked human serum albumin (HSA) and evaluated how reproducible cross-linked peptides can be identified and quantified by LC-MS analysis. To make QCLMS accessible to a broader research community, we developed a workflow that integrates the established software tools MaxQuant for spectra preprocessing, Xi for cross-linked peptide identification, and finally Skyline for quantification (MS1 filtering). Out of the 221 unique residue pairs identified in our sample, 124 were subsequently quantified across 10 analyses with coefficient of variation (CV) values of 14% (injection replica) and 32% (reaction replica). Thus our results demonstrate that the reproducibility of QCLMS is in line with the reproducibility of general quantitative proteomics and we establish a robust workflow for MS1-based quantitation of cross-linked peptides.
Microarray analysis of retinal gene expression in Egr-1 knockout mice
Schippert, Ruth; Schaeffel, Frank
2009-01-01
Purpose We found earlier that 42 day-old Egr-1 knockout mice had longer eyes and a more myopic refractive error compared to their wild-types. To identify genes that could be responsible for the temporarily enhanced axial eye growth, a microarray analysis was performed in knockout and wild-type mice at the postnatal ages of 30 and 42 days. Methods The retinas of homozygous and wild-type Egr-1 knockout mice (Taconic, Ry, Denmark) were prepared for RNA isolation (RNeasy Mini Kit, Qiagen) at the age of 30 or 42 days, respectively (n=12 each). Three retinas were pooled and labeled cRNA was made. The samples were hybridized to Affymetrix GeneChip Mouse Genome 430 2.0 Arrays. Hybridization signals were calculated using GC-RMA normalization. Genes were identified as differentially expressed if they showed a fold-change (FC) of at least 1.5 and a p-value <0.05. A false-discovery rate of 5% was applied. Ten genes with potential biologic relevance were examined further with semiquantitative real-time RT–PCR. Results Comparing mRNA expression levels between wild-type and homozygous Egr-1 knockout mice, we found 73 differentially expressed genes at the age of 30 days and 135 genes at the age of 42 days. Testing for differences in gene expression between the two ages (30 versus 42 days), 54 genes were differently expressed in wild-type mice and 215 genes in homozygous animals. Based on three networks proposed by Ingenuity pathway analysis software, nine differently expressed genes in the homozygous Egr-1 knockout mice were chosen for further validation by real-time RT–PCR, three genes in each network. In addition, the gene that was most prominently regulated in the knockout mice, compared to wild-type, at both 30 days and 42 days of age (protocadherin beta-9 [Pcdhb9]), was tested with real-time RT–PCR. Changes in four of the ten genes could be confirmed by real-time RT–PCR: nuclear prelamin A recognition factor (Narf), oxoglutarate dehydrogenase (Ogdh), selenium binding protein 1 (Selenbp1), and Pcdhb9. Except for Pcdhb9, the genes whose mRNA expression levels were validated were listed in one of the networks proposed by Ingenuity pathway analysis software. In addition to these genes, the software proposed several key-regulators which did not change in our study: retinoic acid, vascular endothelial growth factor A (VEGF-A), FBJ murine osteosarcoma viral oncogene homolog (cFos), and others. Conclusions Identification of genes that are differentially regulated during the development period between postnatal day 30 (when both homozygous and wild-type mice still have the same axial length) and day 42 (where the difference in eye length is apparent) could improve the understanding of mechanisms for the control of axial eye growth and may lead to potential targets for pharmacological intervention. With the aid of pathway-analysis software, a coarse picture of possible biochemical pathways could be generated. Although the mRNA expression levels of proteins proposed by the software, like VEGF, FOS, retinoic acid (RA) receptors, or cellular RA binding protein, did not show any changes in our experiment, these molecules have previously been implicated in the signaling cascades controlling axial eye growth. According to the pathway-analysis software, they represent links between several proteins whose mRNA expression was changed in our study. PMID:20019881
Microarray analysis of retinal gene expression in Egr-1 knockout mice.
Schippert, Ruth; Schaeffel, Frank; Feldkaemper, Marita Pauline
2009-12-10
We found earlier that 42 day-old Egr-1 knockout mice had longer eyes and a more myopic refractive error compared to their wild-types. To identify genes that could be responsible for the temporarily enhanced axial eye growth, a microarray analysis was performed in knockout and wild-type mice at the postnatal ages of 30 and 42 days. The retinas of homozygous and wild-type Egr-1 knockout mice (Taconic, Ry, Denmark) were prepared for RNA isolation (RNeasy Mini Kit, Qiagen) at the age of 30 or 42 days, respectively (n=12 each). Three retinas were pooled and labeled cRNA was made. The samples were hybridized to Affymetrix GeneChip Mouse Genome 430 2.0 Arrays. Hybridization signals were calculated using GC-RMA normalization. Genes were identified as differentially expressed if they showed a fold-change (FC) of at least 1.5 and a p-value <0.05. A false-discovery rate of 5% was applied. Ten genes with potential biologic relevance were examined further with semiquantitative real-time RT-PCR. Comparing mRNA expression levels between wild-type and homozygous Egr-1 knockout mice, we found 73 differentially expressed genes at the age of 30 days and 135 genes at the age of 42 days. Testing for differences in gene expression between the two ages (30 versus 42 days), 54 genes were differently expressed in wild-type mice and 215 genes in homozygous animals. Based on three networks proposed by Ingenuity pathway analysis software, nine differently expressed genes in the homozygous Egr-1 knockout mice were chosen for further validation by real-time RT-PCR, three genes in each network. In addition, the gene that was most prominently regulated in the knockout mice, compared to wild-type, at both 30 days and 42 days of age (protocadherin beta-9 [Pcdhb9]), was tested with real-time RT-PCR. Changes in four of the ten genes could be confirmed by real-time RT-PCR: nuclear prelamin A recognition factor (Narf), oxoglutarate dehydrogenase (Ogdh), selenium binding protein 1 (Selenbp1), and Pcdhb9. Except for Pcdhb9, the genes whose mRNA expression levels were validated were listed in one of the networks proposed by Ingenuity pathway analysis software. In addition to these genes, the software proposed several key-regulators which did not change in our study: retinoic acid, vascular endothelial growth factor A (VEGF-A), FBJ murine osteosarcoma viral oncogene homolog (cFos), and others. Identification of genes that are differentially regulated during the development period between postnatal day 30 (when both homozygous and wild-type mice still have the same axial length) and day 42 (where the difference in eye length is apparent) could improve the understanding of mechanisms for the control of axial eye growth and may lead to potential targets for pharmacological intervention. With the aid of pathway-analysis software, a coarse picture of possible biochemical pathways could be generated. Although the mRNA expression levels of proteins proposed by the software, like VEGF, FOS, retinoic acid (RA) receptors, or cellular RA binding protein, did not show any changes in our experiment, these molecules have previously been implicated in the signaling cascades controlling axial eye growth. According to the pathway-analysis software, they represent links between several proteins whose mRNA expression was changed in our study.
Automated designation of tie-points for image-to-image coregistration.
R.E. Kennedy; W.B. Cohen
2003-01-01
Image-to-image registration requires identification of common points in both images (image tie-points: ITPs). Here we describe software implementing an automated, area-based technique for identifying ITPs. The ITP software was designed to follow two strategies: ( I ) capitalize on human knowledge and pattern recognition strengths, and (2) favour robustness in many...
Identification of Factors That Affect Software Complexity.
ERIC Educational Resources Information Center
Kaiser, Javaid
A survey of computer scientists was conducted to identify factors that affect software complexity. A total of 160 items were selected from the literature to include in a questionnaire sent to 425 individuals who were employees of computer-related businesses in Lawrence and Kansas City. The items were grouped into nine categories called system…
Practical Issues in Implementing Software Reliability Measurement
NASA Technical Reports Server (NTRS)
Nikora, Allen P.; Schneidewind, Norman F.; Everett, William W.; Munson, John C.; Vouk, Mladen A.; Musa, John D.
1999-01-01
Many ways of estimating software systems' reliability, or reliability-related quantities, have been developed over the past several years. Of particular interest are methods that can be used to estimate a software system's fault content prior to test, or to discriminate between components that are fault-prone and those that are not. The results of these methods can be used to: 1) More accurately focus scarce fault identification resources on those portions of a software system most in need of it. 2) Estimate and forecast the risk of exposure to residual faults in a software system during operation, and develop risk and safety criteria to guide the release of a software system to fielded use. 3) Estimate the efficiency of test suites in detecting residual faults. 4) Estimate the stability of the software maintenance process.
Tie, Cai; Hu, Ting; Jia, Zhi-Xin; Zhang, Jin-Lan
2015-08-18
Fatty acids (FAs) are a group of lipid molecules that are essential to organisms. As potential biomarkers for different diseases, FAs have attracted increasing attention from both biological researchers and the pharmaceutical industry. A sensitive and accurate method for globally profiling and identifying FAs is required for biomarker discovery. The high selectivity and sensitivity of high-performance liquid chromatography-multiple reaction monitoring (HPLC-MRM) gives it great potential to fulfill the need to identify FAs from complicated matrices. This paper developed a new approach for global FA profiling and identification for HPLC-MRM FA data mining. Mathematical models for identifying FAs were simulated using the isotope-induced retention time (RT) shift (IRS) and peak area ratios between parallel isotope peaks for a series of FA standards. The FA structures were predicated using another model based on the RT and molecular weight. Fully automated FA identification software was coded using the Qt platform based on these mathematical models. Different samples were used to verify the software. A high identification efficiency (greater than 75%) was observed when 96 FA species were identified in plasma. This FAs identification strategy promises to accelerate FA research and applications.
Expanding the cerebrospinal fluid endopeptidome.
Hansson, Karl T; Skillbäck, Tobias; Pernevik, Elin; Kern, Silke; Portelius, Erik; Höglund, Kina; Brinkmalm, Gunnar; Holmén-Larsson, Jessica; Blennow, Kaj; Zetterberg, Henrik; Gobom, Johan
2017-03-01
Biomarkers of neurodegenerative disorders are needed to assist in diagnosis, to monitor disease progression and therapeutic interventions, and to provide insight into disease mechanisms. One route to identify such biomarkers is by proteomic and peptidomic analysis of cerebrospinal fluid (CSF). In the current study, we performed an in-depth analysis of the human CSF endopeptidome to establish an inventory that may serve as a basis for future targeted biomarker studies. High-pH RP HPLC was employed for off-line sample prefractionation followed by low-pH nano-LC-MS analysis. Different software programs and scoring algorithms for peptide identification were employed and compared. A total of 18 031 endogenous peptides were identified at a FDR of 1%, increasing the number of known endogenous CSF peptides 10-fold compared to previous studies. The peptides were derived from 2 053 proteins of which more than 60 have been linked to neurodegeneration. Notably, among the findings were six peptides derived from microtubule-associated protein tau, three of which span the diagnostically interesting threonine-181 (Tau-F isoform). Also, 213 peptides from amyloid precursor protein were identified, 58 of which were partially or completely within the sequence of amyloid β 1-40/42, as well as 109 peptides from apolipoprotein E, spanning sequences that discriminate between the E2/E3/E4 isoforms of the protein. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NMRNet: A deep learning approach to automated peak picking of protein NMR spectra.
Klukowski, Piotr; Augoff, Michal; Zieba, Maciej; Drwal, Maciej; Gonczarek, Adam; Walczak, Michal J
2018-03-14
Automated selection of signals in protein NMR spectra, known as peak picking, has been studied for over 20 years, nevertheless existing peak picking methods are still largely deficient. Accurate and precise automated peak picking would accelerate the structure calculation, and analysis of dynamics and interactions of macromolecules. Recent advancement in handling big data, together with an outburst of machine learning techniques, offer an opportunity to tackle the peak picking problem substantially faster than manual picking and on par with human accuracy. In particular, deep learning has proven to systematically achieve human-level performance in various recognition tasks, and thus emerges as an ideal tool to address automated identification of NMR signals. We have applied a convolutional neural network for visual analysis of multidimensional NMR spectra. A comprehensive test on 31 manually-annotated spectra has demonstrated top-tier average precision (AP) of 0.9596, 0.9058 and 0.8271 for backbone, side-chain and NOESY spectra, respectively. Furthermore, a combination of extracted peak lists with automated assignment routine, FLYA, outperformed other methods, including the manual one, and led to correct resonance assignment at the levels of 90.40%, 89.90% and 90.20% for three benchmark proteins. The proposed model is a part of a Dumpling software (platform for protein NMR data analysis), and is available at https://dumpling.bio/. michaljerzywalczak@gmail.compiotr.klukowski@pwr.edu.pl. Supplementary data are available at Bioinformatics online.
Targeted Feature Detection for Data-Dependent Shotgun Proteomics
2017-01-01
Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser, Hendrik; Choudhary, Jyoti S
2017-08-04
Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Fan, Long; Hui, Jerome H L; Yu, Zu Guo; Chu, Ka Hou
2014-07-01
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/. © 2014 John Wiley & Sons Ltd.
Meinert, Christian; Gembardt, Florian; Böhme, Ilka; Tetzner, Anja; Wieland, Thomas; Greenberg, Barry; Walther, Thomas
2016-01-01
The study aimed to identify proteins regulated by the cardiovascular protective peptide angiotensin-(1-7) and to determine potential intracellular signaling cascades. Human endothelial cells were stimulated with Ang-(1-7) for 1 h, 3 h, 6 h, and 9 h. Peptide effects on intracellular signaling were assessed via antibody microarray, containing antibodies against 725 proteins. Bioinformatics software was used to identify affected intracellular signaling pathways. Microarray data was verified exemplarily by Western blot, Real-Time RT-PCR, and immunohistochemical studies. The microarray identified 110 regulated proteins after 1 h, 119 after 3 h, 31 after 6 h, and 86 after 9 h Ang-(1-7) stimulation. Regulated proteins were associated with high significance to several metabolic pathways like “Molecular Mechanism of Cancer” and “p53 signaling” in a time dependent manner. Exemplarily, Western blots for the E3-type small ubiquitin-like modifier ligase PIAS2 confirmed the microarray data and displayed a decrease by more than 50% after Ang-(1-7) stimulation at 1 h and 3 h without affecting its mRNA. Immunohistochemical studies with PIAS2 in human endothelial cells showed a decrease in cytoplasmic PIAS2 after Ang-(1-7) treatment. The Ang-(1-7) mediated decrease of PIAS2 was reproduced in other endothelial cell types. The results suggest that angiotensin-(1-7) plays a role in metabolic pathways related to cell death and cell survival in human endothelial cells.
Wang, Hongbin; Zhang, Yongqian; Gui, Shuqi; Zhang, Yong; Lu, Fuping; Deng, Yulin
2017-08-15
Comparisons across large numbers of samples are frequently necessary in quantitative proteomics. Many quantitative methods used in proteomics are based on stable isotope labeling, but most of these are only useful for comparing two samples. For up to eight samples, the iTRAQ labeling technique can be used. For greater numbers of samples, the label-free method has been used, but this method was criticized for low reproducibility and accuracy. An ingenious strategy has been introduced, comparing each sample against a 18 O-labeled reference sample that was created by pooling equal amounts of all samples. However, it is necessary to use proportion-known protein mixtures to investigate and evaluate this new strategy. Another problem for comparative proteomics of multiple samples is the poor coincidence and reproducibility in protein identification results across samples. In present study, a method combining 18 O-reference strategy and a quantitation and identification-decoupled strategy was investigated with proportion-known protein mixtures. The results obviously demonstrated that the 18 O-reference strategy had greater accuracy and reliability than other previously used comparison methods based on transferring comparison or label-free strategies. By the decoupling strategy, the quantification data acquired by LC-MS and the identification data acquired by LC-MS/MS are matched and correlated to identify differential expressed proteins, according to retention time and accurate mass. This strategy made protein identification possible for all samples using a single pooled sample, and therefore gave a good reproducibility in protein identification across multiple samples, and allowed for optimizing peptide identification separately so as to identify more proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction of type III secretion signals in genomes of gram-negative bacteria.
Löwer, Martin; Schneider, Gisbert
2009-06-15
Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Semi-automated De-identification of German Content Sensitive Reports for Big Data Analytics.
Seuss, Hannes; Dankerl, Peter; Ihle, Matthias; Grandjean, Andrea; Hammon, Rebecca; Kaestle, Nicola; Fasching, Peter A; Maier, Christian; Christoph, Jan; Sedlmayr, Martin; Uder, Michael; Cavallaro, Alexander; Hammon, Matthias
2017-07-01
Purpose Projects involving collaborations between different institutions require data security via selective de-identification of words or phrases. A semi-automated de-identification tool was developed and evaluated on different types of medical reports natively and after adapting the algorithm to the text structure. Materials and Methods A semi-automated de-identification tool was developed and evaluated for its sensitivity and specificity in detecting sensitive content in written reports. Data from 4671 pathology reports (4105 + 566 in two different formats), 2804 medical reports, 1008 operation reports, and 6223 radiology reports of 1167 patients suffering from breast cancer were de-identified. The content was itemized into four categories: direct identifiers (name, address), indirect identifiers (date of birth/operation, medical ID, etc.), medical terms, and filler words. The software was tested natively (without training) in order to establish a baseline. The reports were manually edited and the model re-trained for the next test set. After manually editing 25, 50, 100, 250, 500 and if applicable 1000 reports of each type re-training was applied. Results In the native test, 61.3 % of direct and 80.8 % of the indirect identifiers were detected. The performance (P) increased to 91.4 % (P25), 96.7 % (P50), 99.5 % (P100), 99.6 % (P250), 99.7 % (P500) and 100 % (P1000) for direct identifiers and to 93.2 % (P25), 97.9 % (P50), 97.2 % (P100), 98.9 % (P250), 99.0 % (P500) and 99.3 % (P1000) for indirect identifiers. Without training, 5.3 % of medical terms were falsely flagged as critical data. The performance increased, after training, to 4.0 % (P25), 3.6 % (P50), 4.0 % (P100), 3.7 % (P250), 4.3 % (P500), and 3.1 % (P1000). Roughly 0.1 % of filler words were falsely flagged. Conclusion Training of the developed de-identification tool continuously improved its performance. Training with roughly 100 edited reports enables reliable detection and labeling of sensitive data in different types of medical reports. Key Points: · Collaborations between different institutions require de-identification of patients' data. · Software-based de-identification of content-sensitive reports grows in importance as a result of 'Big data'. · A de-identification software was developed and tested natively and after training. · The proposed de-identification software worked quite reliably, following training with roughly 100 edited reports. · A final check of the texts by an authorized person remains necessary. Citation Format · Seuss H, Dankerl P, Ihle M et al. Semi-automated De-identification of German Content Sensitive Reports for Big Data Analytics. Fortschr Röntgenstr 2017; 189: 661 - 671. © Georg Thieme Verlag KG Stuttgart · New York.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Nicholas R.; Pointer, William David; Sieger, Matt
2016-04-01
The goal of this review is to enable application of codes or software packages for safety assessment of advanced sodium-cooled fast reactor (SFR) designs. To address near-term programmatic needs, the authors have focused on two objectives. First, the authors have focused on identification of requirements for software QA that must be satisfied to enable the application of software to future safety analyses. Second, the authors have collected best practices applied by other code development teams to minimize cost and time of initial code qualification activities and to recommend a path to the stated goal.
Mugshot Identification Database (MID)
National Institute of Standards and Technology Data Gateway
NIST Mugshot Identification Database (MID) (Web, free access) NIST Special Database 18 is being distributed for use in development and testing of automated mugshot identification systems. The database consists of three CD-ROMs, containing a total of 3248 images of variable size using lossless compression. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.
Jing, Lan; Guo, Dandan; Hu, Wenjie; Niu, Xiaofan
2017-03-11
Many plant pathogen secretory proteins are known to be elicitors or pathogenic factors,which play an important role in the host-pathogen interaction process. Bioinformatics approaches make possible the large scale prediction and analysis of secretory proteins from the Puccinia helianthi transcriptome. The internet-based software SignalP v4.1, TargetP v1.01, Big-PI predictor, TMHMM v2.0 and ProtComp v9.0 were utilized to predict the signal peptides and the signal peptide-dependent secreted proteins among the 35,286 ORFs of the P. helianthi transcriptome. 908 ORFs (accounting for 2.6% of the total proteins) were identified as putative secretory proteins containing signal peptides. The length of the majority of proteins ranged from 51 to 300 amino acids (aa), while the signal peptides were from 18 to 20 aa long. Signal peptidase I (SpI) cleavage sites were found in 463 of these putative secretory signal peptides. 55 proteins contained the lipoprotein signal peptide recognition site of signal peptidase II (SpII). Out of 908 secretory proteins, 581 (63.8%) have functions related to signal recognition and transduction, metabolism, transport and catabolism. Additionally, 143 putative secretory proteins were categorized into 27 functional groups based on Gene Ontology terms, including 14 groups in biological process, seven in cellular component, and six in molecular function. Gene ontology analysis of the secretory proteins revealed an enrichment of hydrolase activity. Pathway associations were established for 82 (9.0%) secretory proteins. A number of cell wall degrading enzymes and three homologous proteins specific to Phytophthora sojae effectors were also identified, which may be involved in the pathogenicity of the sunflower rust pathogen. This investigation proposes a new approach for identifying elicitors and pathogenic factors. The eventual identification and characterization of 908 extracellularly secreted proteins will advance our understanding of the molecular mechanisms of interactions between sunflower and rust pathogen and will enhance our ability to intervene in disease states.
Evaluation of Software for Introducing Protein Structure: Visualization and Simulation
ERIC Educational Resources Information Center
White, Brian; Kahriman, Azmin; Luberice, Lois; Idleh, Farhia
2010-01-01
Communicating an understanding of the forces and factors that determine a protein's structure is an important goal of many biology and biochemistry courses at a variety of levels. Many educators use computer software that allows visualization of these complex molecules for this purpose. Although visualization is in wide use and has been associated…
Sun, Yongmei; Li, Xing; Wu, Di; Pan, Qi; Ji, Yuefeng; Ren, Hong; Ding, Keyue
2016-01-01
RNA editing is one of the post- or co-transcriptional processes that can lead to amino acid substitutions in protein sequences, alternative pre-mRNA splicing, and changes in gene expression levels. Although several methods have been suggested to identify RNA editing sites, there remains challenges to be addressed in distinguishing true RNA editing sites from its counterparts on genome and technical artifacts. In addition, there lacks a software framework to identify and visualize potential RNA editing sites. Here, we presented a software - 'RED' (RNA Editing sites Detector) - for the identification of RNA editing sites by integrating multiple rule-based and statistical filters. The potential RNA editing sites can be visualized at the genome and the site levels by graphical user interface (GUI). To improve performance, we used MySQL database management system (DBMS) for high-throughput data storage and query. We demonstrated the validity and utility of RED by identifying the presence and absence of C→U RNA-editing sites experimentally validated, in comparison with REDItools, a command line tool to perform high-throughput investigation of RNA editing. In an analysis of a sample data-set with 28 experimentally validated C→U RNA editing sites, RED had sensitivity and specificity of 0.64 and 0.5. In comparison, REDItools had a better sensitivity (0.75) but similar specificity (0.5). RED is an easy-to-use, platform-independent Java-based software, and can be applied to RNA-seq data without or with DNA sequencing data. The package is freely available under the GPLv3 license at http://github.com/REDetector/RED or https://sourceforge.net/projects/redetector.
Sun, Yongmei; Li, Xing; Wu, Di; Pan, Qi; Ji, Yuefeng; Ren, Hong; Ding, Keyue
2016-01-01
RNA editing is one of the post- or co-transcriptional processes that can lead to amino acid substitutions in protein sequences, alternative pre-mRNA splicing, and changes in gene expression levels. Although several methods have been suggested to identify RNA editing sites, there remains challenges to be addressed in distinguishing true RNA editing sites from its counterparts on genome and technical artifacts. In addition, there lacks a software framework to identify and visualize potential RNA editing sites. Here, we presented a software − ‘RED’ (RNA Editing sites Detector) − for the identification of RNA editing sites by integrating multiple rule-based and statistical filters. The potential RNA editing sites can be visualized at the genome and the site levels by graphical user interface (GUI). To improve performance, we used MySQL database management system (DBMS) for high-throughput data storage and query. We demonstrated the validity and utility of RED by identifying the presence and absence of C→U RNA-editing sites experimentally validated, in comparison with REDItools, a command line tool to perform high-throughput investigation of RNA editing. In an analysis of a sample data-set with 28 experimentally validated C→U RNA editing sites, RED had sensitivity and specificity of 0.64 and 0.5. In comparison, REDItools had a better sensitivity (0.75) but similar specificity (0.5). RED is an easy-to-use, platform-independent Java-based software, and can be applied to RNA-seq data without or with DNA sequencing data. The package is freely available under the GPLv3 license at http://github.com/REDetector/RED or https://sourceforge.net/projects/redetector. PMID:26930599
Statistical models for detecting differential chromatin interactions mediated by a protein.
Niu, Liang; Li, Guoliang; Lin, Shili
2014-01-01
Chromatin interactions mediated by a protein of interest are of great scientific interest. Recent studies show that protein-mediated chromatin interactions can have different intensities in different types of cells or in different developmental stages of a cell. Such differences can be associated with a disease or with the development of a cell. Thus, it is of great importance to detect protein-mediated chromatin interactions with different intensities in different cells. A recent molecular technique, Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET), which uses formaldehyde cross-linking and paired-end sequencing, is able to detect genome-wide chromatin interactions mediated by a protein of interest. Here we proposed two models (One-Step Model and Two-Step Model) for two sample ChIA-PET count data (one biological replicate in each sample) to identify differential chromatin interactions mediated by a protein of interest. Both models incorporate the data dependency and the extent to which a fragment pair is related to a pair of DNA loci of interest to make accurate identifications. The One-Step Model makes use of the data more efficiently but is more computationally intensive. An extensive simulation study showed that the models can detect those differentially interacted chromatins and there is a good agreement between each classification result and the truth. Application of the method to a two-sample ChIA-PET data set illustrates its utility. The two models are implemented as an R package MDM (available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM).
Statistical Models for Detecting Differential Chromatin Interactions Mediated by a Protein
Niu, Liang; Li, Guoliang; Lin, Shili
2014-01-01
Chromatin interactions mediated by a protein of interest are of great scientific interest. Recent studies show that protein-mediated chromatin interactions can have different intensities in different types of cells or in different developmental stages of a cell. Such differences can be associated with a disease or with the development of a cell. Thus, it is of great importance to detect protein-mediated chromatin interactions with different intensities in different cells. A recent molecular technique, Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET), which uses formaldehyde cross-linking and paired-end sequencing, is able to detect genome-wide chromatin interactions mediated by a protein of interest. Here we proposed two models (One-Step Model and Two-Step Model) for two sample ChIA-PET count data (one biological replicate in each sample) to identify differential chromatin interactions mediated by a protein of interest. Both models incorporate the data dependency and the extent to which a fragment pair is related to a pair of DNA loci of interest to make accurate identifications. The One-Step Model makes use of the data more efficiently but is more computationally intensive. An extensive simulation study showed that the models can detect those differentially interacted chromatins and there is a good agreement between each classification result and the truth. Application of the method to a two-sample ChIA-PET data set illustrates its utility. The two models are implemented as an R package MDM (available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM). PMID:24835279
PRIDE: new developments and new datasets.
Jones, Philip; Côté, Richard G; Cho, Sang Yun; Klie, Sebastian; Martens, Lennart; Quinn, Antony F; Thorneycroft, David; Hermjakob, Henning
2008-01-01
The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.
Metal species involved in long distance metal transport in plants
Álvarez-Fernández, Ana; Díaz-Benito, Pablo; Abadía, Anunciación; López-Millán, Ana-Flor; Abadía, Javier
2014-01-01
The mechanisms plants use to transport metals from roots to shoots are not completely understood. It has long been proposed that organic molecules participate in metal translocation within the plant. However, until recently the identity of the complexes involved in the long-distance transport of metals could only be inferred by using indirect methods, such as analyzing separately the concentrations of metals and putative ligands and then using in silico chemical speciation software to predict metal species. Molecular biology approaches also have provided a breadth of information about putative metal ligands and metal complexes occurring in plant fluids. The new advances in analytical techniques based on mass spectrometry and the increased use of synchrotron X-ray spectroscopy have allowed for the identification of some metal-ligand species in plant fluids such as the xylem and phloem saps. Also, some proteins present in plant fluids can bind metals and a few studies have explored this possibility. This study reviews the analytical challenges researchers have to face to understand long-distance metal transport in plants as well as the recent advances in the identification of the ligand and metal-ligand complexes in plant fluids. PMID:24723928
Multiscale global identification of porous structures
NASA Astrophysics Data System (ADS)
Hatłas, Marcin; Beluch, Witold
2018-01-01
The paper is devoted to the evolutionary identification of the material constants of porous structures based on measurements conducted on a macro scale. Numerical homogenization with the RVE concept is used to determine the equivalent properties of a macroscopically homogeneous material. Finite element method software is applied to solve the boundary-value problem in both scales. Global optimization methods in form of evolutionary algorithm are employed to solve the identification task. Modal analysis is performed to collect the data necessary for the identification. A numerical example presenting the effectiveness of proposed attitude is attached.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.
Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed
2016-01-01
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Velupillai, Sumithra; Dalianis, Hercules; Hassel, Martin; Nilsson, Gunnar H
2009-12-01
Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure. This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.
Supporting Fourth-Grade Students' Word Identification Using Application Software
ERIC Educational Resources Information Center
Moser, Gary P.; Morrison, Timothy G.; Wilcox, Brad
2017-01-01
A quasi-experimental study examined effects of a 10-week word structure intervention with fourth-grade students. During daily 10-15-minute practice periods, students worked individually with mobile apps focused on specific aspects of word identification. Pre- and post-treatment assessments showed no differences in rate and accuracy of oral reading…
Exploring State-of-the-Art Software for Forensic Authorship Identification
ERIC Educational Resources Information Center
Guillén-Nieto, Victoria; Vargas-Sierra, Chelo; Pardiño-Juan, Maria; Martinez-Barco, Patricio; Suárez-Cueto, Armando
2008-01-01
Back in the 1990s Malcolm Coulthard announced the beginnings of an emerging discipline, "forensic linguistics", resulting from the interface of language, crime and the law. Today the courts are more than ever calling on language experts to help in certain types of cases, such as authorship identification, plagiarism, legal interpreting…
The Use of Computer-Assisted Identification of ARIMA Time-Series.
ERIC Educational Resources Information Center
Brown, Roger L.
This study was conducted to determine the effects of using various levels of tutorial statistical software for the tentative identification of nonseasonal ARIMA models, a statistical technique proposed by Box and Jenkins for the interpretation of time-series data. The Box-Jenkins approach is an iterative process encompassing several stages of…
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-01
... exclude (or delist) a certain solid waste generated by its Beaumont, Texas, facility from the lists of hazardous wastes. EPA used the Delisting Risk Assessment Software (DRAS) Version 3.0 in the evaluation of... Waste Management System; Identification and Listing of Hazardous Waste; Proposed Rule AGENCY...
Ueno, Yutaka; Ito, Shuntaro; Konagaya, Akihiko
2014-12-01
To better understand the behaviors and structural dynamics of proteins within a cell, novel software tools are being developed that can create molecular animations based on the findings of structural biology. This study proposes our method developed based on our prototypes to detect collisions and examine the soft-body dynamics of molecular models. The code was implemented with a software development toolkit for rigid-body dynamics simulation and a three-dimensional graphics library. The essential functions of the target software system included the basic molecular modeling environment, collision detection in the molecular models, and physical simulations of the movement of the model. Taking advantage of recent software technologies such as physics simulation modules and interpreted scripting language, the functions required for accurate and meaningful molecular animation were implemented efficiently.
Techniques for development of safety-related software for surgical robots.
Varley, P
1999-12-01
Regulatory bodies require evidence that software controlling potentially hazardous devices is developed to good manufacturing practices. Effective techniques used in other industries assume long timescales and high staffing levels and can be unsuitable for use without adaptation in developing electronic healthcare devices. This paper discusses a set of techniques used in practice to develop software for a particular innovative medical product, an endoscopic camera manipulator. These techniques include identification of potential hazards and tracing their mitigating factors through the project lifecycle.
Hu, Zhi-yu; Zhang, Lei; Ma, Wei-guang; Yan, Xiao-juan; Li, Zhi-xin; Zhang, Yong-zhi; Wang, Le; Dong, Lei; Yin, Wang-bao; Jia, Suo-tang
2012-03-01
Self-designed identifying software for LIBS spectral line was introduced. Being integrated with LabVIEW, the soft ware can smooth spectral lines and pick peaks. The second difference and threshold methods were employed. Characteristic spectrum of several elements matches the NIST database, and realizes automatic spectral line identification and qualitative analysis of the basic composition of sample. This software can analyze spectrum handily and rapidly. It will be a useful tool for LIBS.
Selecting Advanced Software Technology in Two Small Manufacturing Enterprises
2004-05-01
improving workflow to further reduce delivery times, enhance customer service, and obtain a competitive advantage . The company wanted help... environment , stakeholders’ needs, ecommerce , shop floor visualization, and collaboration capability. These statements are not significantly different...for the purpose of describing a software environment . This identification does not imply any recommendation or endorsement by NIST, the SEI, CMU, or
The Role of Dynamic Software in the Identification and Construction of Mathematical Relationships
ERIC Educational Resources Information Center
Santos-Trigo, Manuel
2004-01-01
What features of mathematical thinking do students exhibit when they use dynamic software in their problem solving approaches? To what extent does the systematic use of technology favour students' development of problem solving competences? What type of reasoning do students develop as a result of using a particular tool? This study documents…
ERIC Educational Resources Information Center
Yang, Yan
2012-01-01
Purpose: This paper aims to discuss the challenge for the classical idea of professionalism in understanding the Chinese software engineering industry after giving a close insight into the development of this industry as well as individual engineers with a psycho-societal perspective. Design/methodology/approach: The study starts with the general…
Soares, Renata; Franco, Catarina; Pires, Elisabete; Ventosa, Miguel; Palhinhas, Rui; Koci, Kamila; Martinho de Almeida, André; Varela Coelho, Ana
2012-07-19
Proteomic approaches are gaining increasing importance in the context of all fields of animal and veterinary sciences, including physiology, productive characterization, and disease/parasite tolerance, among others. Proteomic studies mainly aim the proteome characterization of a certain organ, tissue, cell type or organism, either in a specific condition or comparing protein differential expression within two or more selected situations. Due to the high complexity of samples, usually total protein extracts, proteomics relies heavily on separation procedures, being 2D-electrophoresis and HPLC the most common, as well as on protein identification using mass spectrometry (MS) based methodologies. Despite the increasing importance of MS in the context of animal and veterinary science studies, the usefulness of such tools is still poorly perceived by the animal science community. This is primarily due to the limited knowledge on mass spectrometry by animal scientists. Additionally, confidence and success in protein identification is hindered by the lack of information in public databases for most of farm animal species and their pathogens, with the exception of cattle (Bos taurus), pig (Sus scrofa) and chicken (Gallus gallus). In this article, we will briefly summarize the main methodologies available for protein identification using mass spectrometry providing a case study of specific applications in the field of animal science. We will also address the difficulties inherent to protein identification using MS, with particular reference to experiments using animal species poorly described in public databases. Additionally, we will suggest strategies to increase the rate of successful identifications when working with farm animal species. Copyright © 2012 Elsevier B.V. All rights reserved.
Segura, María Mercedes; Garnier, Alain; Di Falco, Marcos Rafael; Whissell, Gavin; Meneses-Acosta, Angélica; Arcand, Normand; Kamen, Amine
2008-01-01
The Moloney murine leukemia virus (MMLV) belongs to the Retroviridae family of enveloped viruses, which is known to acquire minute amounts of host cellular proteins both on the surface and inside the virion. Despite the extensive use of retroviral vectors in experimental and clinical applications, the repertoire of host proteins incorporated into MMLV vector particles remains unexplored. We report here the identification of host proteins from highly purified retroviral vector preparations obtained by rate-zonal ultracentrifugation. Viral proteins were fractionated by one-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis, in-gel tryptic digested, and subjected to liquid chromatography/tandem mass spectrometry analysis. Immunogold electron microscopy studies confirmed the presence of several host membrane proteins exposed at the vector surface. These studies led to the identification of 27 host proteins on MMLV vector particles derived from 293 HEK cells, including 5 proteins previously described as part of wild-type MMLV. Nineteen host proteins identified corresponded to intracellular proteins. A total of eight host membrane proteins were identified, including cell adhesion proteins integrin β1 (fibronectin receptor subunit beta) and HMFG-E8, tetraspanins CD81 and CD9, and late endosomal markers CD63 and Lamp-2. Identification of membrane proteins on the retroviral surface is particularly attractive, since they can serve as anchoring sites for the insertion of tags for targeting or purification purposes. The implications of our findings for retrovirus-mediated gene therapy are discussed. PMID:18032515
Tsuchiya, Megumi; Karim, M Rezaul; Matsumoto, Taro; Ogawa, Hidesato; Taniguchi, Hiroaki
2017-01-24
Transcriptional coregulators are vital to the efficient transcriptional regulation of nuclear chromatin structure. Coregulators play a variety of roles in regulating transcription. These include the direct interaction with transcription factors, the covalent modification of histones and other proteins, and the occasional chromatin conformation alteration. Accordingly, establishing relatively quick methods for identifying proteins that interact within this network is crucial to enhancing our understanding of the underlying regulatory mechanisms. LC-MS/MS-mediated protein binding partner identification is a validated technique used to analyze protein-protein interactions. By immunoprecipitating a previously-identified member of a protein complex with an antibody (occasionally with an antibody for a tagged protein), it is possible to identify its unknown protein interactions via mass spectrometry analysis. Here, we present a method of protein preparation for the LC-MS/MS-mediated high-throughput identification of protein interactions involving nuclear cofactors and their binding partners. This method allows for a better understanding of the transcriptional regulatory mechanisms of the targeted nuclear factors.
Meckes, David G
2014-01-01
The identification and characterization of herpes simplex virus protein interaction complexes are fundamental to understanding the molecular mechanisms governing the replication and pathogenesis of the virus. Recent advances in affinity-based methods, mass spectrometry configurations, and bioinformatics tools have greatly increased the quantity and quality of protein-protein interaction datasets. In this chapter, detailed and reliable methods that can easily be implemented are presented for the identification of protein-protein interactions using cryogenic cell lysis, affinity purification, trypsin digestion, and mass spectrometry.
Coble, M D; Buckleton, J; Butler, J M; Egeland, T; Fimmers, R; Gill, P; Gusmão, L; Guttman, B; Krawczak, M; Morling, N; Parson, W; Pinto, N; Schneider, P M; Sherry, S T; Willuweit, S; Prinz, M
2016-11-01
The use of biostatistical software programs to assist in data interpretation and calculate likelihood ratios is essential to forensic geneticists and part of the daily case work flow for both kinship and DNA identification laboratories. Previous recommendations issued by the DNA Commission of the International Society for Forensic Genetics (ISFG) covered the application of bio-statistical evaluations for STR typing results in identification and kinship cases, and this is now being expanded to provide best practices regarding validation and verification of the software required for these calculations. With larger multiplexes, more complex mixtures, and increasing requests for extended family testing, laboratories are relying more than ever on specific software solutions and sufficient validation, training and extensive documentation are of upmost importance. Here, we present recommendations for the minimum requirements to validate bio-statistical software to be used in forensic genetics. We distinguish between developmental validation and the responsibilities of the software developer or provider, and the internal validation studies to be performed by the end user. Recommendations for the software provider address, for example, the documentation of the underlying models used by the software, validation data expectations, version control, implementation and training support, as well as continuity and user notifications. For the internal validations the recommendations include: creating a validation plan, requirements for the range of samples to be tested, Standard Operating Procedure development, and internal laboratory training and education. To ensure that all laboratories have access to a wide range of samples for validation and training purposes the ISFG DNA commission encourages collaborative studies and public repositories of STR typing results. Published by Elsevier Ireland Ltd.
Acidic Ribosomal Proteins from the Extreme ’Halobacterium cutirubrum’,
the extreme halophilic bacterium, Halobacterium cutirubrum. The identification of the protein moieties involved in these and other interactions in...the halophile ribosome requires a rapid and reproducible screening method for the separation, enumeration and identification of these acidic...polypeptides in the complex ribosomal protein mixtures. In this paper the authors present the results of analyses of the halophile ribosomal proteins using a
Systematic Errors in Peptide and Protein Identification and Quantification by Modified Peptides*
Bogdanow, Boris; Zauber, Henrik; Selbach, Matthias
2016-01-01
The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. The quality of peptide and protein identification and quantification critically depends on the sensitivity and specificity of this assignment process. Many peptides in proteomic samples carry biochemical modifications, and a large fraction of unassigned spectra arise from modified peptides. Spectra derived from modified peptides can erroneously be assigned to wrong amino acid sequences. However, the impact of this problem on proteomic data has not yet been investigated systematically. Here we use combinations of different database searches to show that modified peptides can be responsible for 20–50% of false positive identifications in deep proteomic data sets. These false positive hits are particularly problematic as they have significantly higher scores and higher intensities than other false positive matches. Furthermore, these wrong peptide assignments lead to hundreds of false protein identifications and systematic biases in protein quantification. We devise a “cleaned search” strategy to address this problem and show that this considerably improves the sensitivity and specificity of proteomic data. In summary, we show that modified peptides cause systematic errors in peptide and protein identification and quantification and should therefore be considered to further improve the quality of proteomic data annotation. PMID:27215553
Ni, Mao-Wei; Wang, Lu; Chen, Wei; Mou, Han-Zhou; Zhou, Jie; Zheng, Zhi-Guo
2017-01-30
Mass spectrometry (MS)-based protein identification depends mainly on protein extraction and digestion. Although sodium dodecyl sulfate (SDS) can preclude enzymatic digestion and interfere with MS analysis, it is still the most widely used surfactant in these steps. To overcome these disadvantages, a SDS-compatible proteomic technique for SDS removal prior to MS-based analyses was developed, namely filter-aided sample preparation (FASP). Herein, based on the effectiveness of sodium deoxycholate and a detergent removal spin column, we developed a modified FASP (mFASP) method and compared its overall performance, total number of peptides and proteins identified for shotgun proteomic experiments with that of the FASP method. Identification of 4570 ± 392 and 9139 ± 317 peptides and description of 862 ± 46 and 1377 ± 33 protein groups with two or more peptides from the ovarian cancer cell line A2780 was accomplished by FASP and mFASP methods, respectively. The mFASP method (21.2 ± 0.2%) had higher average peptide to protein coverage than FASP method (13.2 ± 0.5%). More hydrophobic peptides were identified by mFASP than by FASP, as indicated by the GRAVY score distribution. The reported method enables reliable and efficient identification of proteins and peptides in whole-cell extracts containing SDS. The new approach allows for higher throughput (the simultaneous identification of more proteins), a more comprehensive investigation of proteins, and potentially the discovery of new biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G; Medzihradszky, Katalin F; Szakács, Gergely; Tusnády, Gábor E
2017-02-13
Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.
Time Series Proteome Profiling
Formolo, Catherine A.; Mintz, Michelle; Takanohashi, Asako; Brown, Kristy J.; Vanderver, Adeline; Halligan, Brian; Hathout, Yetrib
2014-01-01
This chapter provides a detailed description of a method used to study temporal changes in the endoplasmic reticulum (ER) proteome of fibroblast cells exposed to ER stress agents (tunicamycin and thapsigargin). Differential stable isotope labeling by amino acids in cell culture (SILAC) is used in combination with crude ER fractionation, SDS–PAGE and LC-MS/MS to define altered protein expression in tunicamycin or thapsigargin treated cells versus untreated cells. Treated and untreated cells are harvested at different time points, mixed at a 1:1 ratio and processed for ER fractionation. Samples containing labeled and unlabeled proteins are separated by SDS–PAGE, bands are digested with trypsin and the resulting peptides analyzed by LC-MS/MS. Proteins are identified using Bioworks software and the Swiss-Prot data-base, whereas ratios of protein expression between treated and untreated cells are quantified using ZoomQuant software. Data visualization is facilitated by GeneSpring software. proteomics PMID:21082445
Lim, Hansaim; Poleksic, Aleksandar; Yao, Yuan; Tong, Hanghang; He, Di; Zhuang, Luke; Meng, Patrick; Xie, Lei
2016-10-01
Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.
Poleksic, Aleksandar; Yao, Yuan; Tong, Hanghang; Meng, Patrick; Xie, Lei
2016-01-01
Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP. PMID:27716836
GRAMM-X public web server for protein–protein docking
Tovchigrechko, Andrey; Vakser, Ilya A.
2006-01-01
Protein docking software GRAMM-X and its web interface () extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track. PMID:16845016
Cereda, Carlo W; Christensen, Søren; Campbell, Bruce Cv; Mishra, Nishant K; Mlynash, Michael; Levi, Christopher; Straka, Matus; Wintermark, Max; Bammer, Roland; Albers, Gregory W; Parsons, Mark W; Lansberg, Maarten G
2016-10-01
Differences in research methodology have hampered the optimization of Computer Tomography Perfusion (CTP) for identification of the ischemic core. We aim to optimize CTP core identification using a novel benchmarking tool. The benchmarking tool consists of an imaging library and a statistical analysis algorithm to evaluate the performance of CTP. The tool was used to optimize and evaluate an in-house developed CTP-software algorithm. Imaging data of 103 acute stroke patients were included in the benchmarking tool. Median time from stroke onset to CT was 185 min (IQR 180-238), and the median time between completion of CT and start of MRI was 36 min (IQR 25-79). Volumetric accuracy of the CTP-ROIs was optimal at an rCBF threshold of <38%; at this threshold, the mean difference was 0.3 ml (SD 19.8 ml), the mean absolute difference was 14.3 (SD 13.7) ml, and CTP was 67% sensitive and 87% specific for identification of DWI positive tissue voxels. The benchmarking tool can play an important role in optimizing CTP software as it provides investigators with a novel method to directly compare the performance of alternative CTP software packages. © The Author(s) 2015.
EDRN-WHI Pre-Clinical Colon Ca Specimens — EDRN Public Portal
Specifically, it is proposed to assess plasma proteins from postmenopausal women diagnosed with colon cancer within a span of 18 months after year-3 OS blood draw and from appropriate matched controls enrolled in the WHI OS study. The range of case-control differences sought in plasma include: 1 Detection and identification of proteins that may be derived from tumor cells through the classical secreted protein pathway and through non-classical pathways (eg protein cleavage and release) or through cell turnover. 2 Detection and identification of protein changes associated with the host response that occur during tumor development and that may be related to inflammation, angiogenesis, infiltration of tumor with host cells and other processes. 3 Identification of tumor derived proteins that induce a humoral immune response in the form of autoantibodies that are detectable at the preclinical stage.
Automatic poisson peak harvesting for high throughput protein identification.
Breen, E J; Hopwood, F G; Williams, K L; Wilkins, M R
2000-06-01
High throughput identification of proteins by peptide mass fingerprinting requires an efficient means of picking peaks from mass spectra. Here, we report the development of a peak harvester to automatically pick monoisotopic peaks from spectra generated on matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) mass spectrometers. The peak harvester uses advanced mathematical morphology and watershed algorithms to first process spectra to stick representations. Subsequently, Poisson modelling is applied to determine which peak in an isotopically resolved group represents the monoisotopic mass of a peptide. We illustrate the features of the peak harvester with mass spectra of standard peptides, digests of gel-separated bovine serum albumin, and with Escherictia coli proteins prepared by two-dimensional polyacrylamide gel electrophoresis. In all cases, the peak harvester proved effective in its ability to pick similar monoisotopic peaks as an experienced human operator, and also proved effective in the identification of monoisotopic masses in cases where isotopic distributions of peptides were overlapping. The peak harvester can be operated in an interactive mode, or can be completely automated and linked through to peptide mass fingerprinting protein identification tools to achieve high throughput automated protein identification.
Jain, Varsha; Patel, Brijesh; Umar, Farhat Paul; Ajithakumar, H. M.; Gurjar, Suraj K.; Gupta, I. D.; Verma, Archana
2017-01-01
Aim: This study was conducted with the objective to identify single nucleotide polymorphism (SNP) in protein phosphatase 1 regulatory subunit 11 (PPP1R11) gene in Murrah bulls. Materials and Methods: Genomic DNA was isolated by phenol–chloroform extraction method from the frozen semen samples of 65 Murrah bulls maintained at Artificial Breeding Research Centre, ICAR-National Dairy Research Institute, Karnal. The quality and concentration of DNA was checked by spectrophotometer reading and agarose gel electrophoresis. The target region of PPP1R11 gene was amplified using four sets of primer designed based on Bos taurus reference sequence. The amplified products were sequenced and aligned using Clustal Omega for identification of SNPs. Animals were genotyped by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) using EcoNI restriction enzyme. Results: The sequences in the NCBI accession number NW_005785016.1 for Bubalus bubalis were compared and aligned with the edited sequences of Murrah bulls with Clustal Omega software. A total of 10 SNPs were found, out of which 1 at 5’UTR, 3 at intron 1, and 6 at intron 2 region. PCR-RFLP using restriction enzyme EcoNI revealed only AA genotype indicating monomorphism in PPP1R11 gene of all Murrah animals included in the study. Conclusion: A total of 10 SNPs were found. PCR-RFLP revealed only AA genotype indicating monomorphism in PPP1R11 gene of all Murrah animals included in the study, due to which association analysis with conception rate was not feasible. PMID:28344410
Identification of fungal microorganisms by MALDI-TOF mass spectrometry.
Chalupová, Jana; Raus, Martin; Sedlářová, Michaela; Sebela, Marek
2014-01-01
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has emerged as a reliable tool for fast identification and classification of microorganisms. In this regard, it represents a strong challenge to microscopic and molecular biology methods. Nowadays, commercial MALDI systems are accessible for biological research work as well as for diagnostic applications in clinical medicine, biotechnology and industry. They are employed namely in bacterial biotyping but numerous experimental strategies have also been developed for the analysis of fungi, which is the topic of the present review. Members of many fungal genera such as Aspergillus, Fusarium, Penicillium or Trichoderma and also various yeasts from clinical samples (e.g. Candida albicans) have been successfully identified by MALDI-TOF MS. However, there is no versatile method for fungi currently available even though the use of only a limited number of matrix compounds has been reported. Either intact cell/spore MALDI-TOF MS is chosen or an extraction of surface proteins is performed and then the resulting extract is measured. Biotrophic fungal phytopathogens can be identified via a direct acquisition of MALDI-TOF mass spectra e.g. from infected plant organs contaminated by fungal spores. Mass spectrometric peptide/protein profiles of fungi display peaks in the m/z region of 1000-20000, where a unique set of biomarker ions may appear facilitating a differentiation of samples at the level of genus, species or strain. This is done with the help of a processing software and spectral database of reference strains, which should preferably be constructed under the same standardized experimental conditions. Copyright © 2013 Elsevier Inc. All rights reserved.
NewProt - a protein engineering portal.
Schwarte, Andreas; Genz, Maika; Skalden, Lilly; Nobili, Alberto; Vickers, Clare; Melse, Okke; Kuipers, Remko; Joosten, Henk-Jan; Stourac, Jan; Bendl, Jaroslav; Black, Jon; Haase, Peter; Baakman, Coos; Damborsky, Jiri; Bornscheuer, Uwe; Vriend, Gert; Venselaar, Hanka
2017-06-01
The NewProt protein engineering portal is a one-stop-shop for in silico protein engineering. It gives access to a large number of servers that compute a wide variety of protein structure characteristics supporting work on the modification of proteins through the introduction of (multiple) point mutations. The results can be inspected through multiple visualizers. The HOPE software is included to indicate mutations with possible undesired side effects. The Hotspot Wizard software is embedded for the design of mutations that modify a proteins' activity, specificity, or stability. The NewProt portal is freely accessible at http://newprot.cmbi.umcn.nl/ and http://newprot.fluidops.net/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
C-mii: a tool for plant miRNA and target identification.
Numnark, Somrak; Mhuantong, Wuttichai; Ingsriswang, Supawadee; Wichadakul, Duangdao
2012-01-01
MicroRNAs (miRNAs) have been known to play an important role in several biological processes in both animals and plants. Although several tools for miRNA and target identification are available, the number of tools tailored towards plants is limited, and those that are available have specific functionality, lack graphical user interfaces, and restrict the number of input sequences. Large-scale computational identifications of miRNAs and/or targets of several plants have been also reported. Their methods, however, are only described as flow diagrams, which require programming skills and the understanding of input and output of the connected programs to reproduce. To overcome these limitations and programming complexities, we proposed C-mii as a ready-made software package for both plant miRNA and target identification. C-mii was designed and implemented based on established computational steps and criteria derived from previous literature with the following distinguishing features. First, software is easy to install with all-in-one programs and packaged databases. Second, it comes with graphical user interfaces (GUIs) for ease of use. Users can identify plant miRNAs and targets via step-by-step execution, explore the detailed results from each step, filter the results according to proposed constraints in plant miRNA and target biogenesis, and export sequences and structures of interest. Third, it supplies bird's eye views of the identification results with infographics and grouping information. Fourth, in terms of functionality, it extends the standard computational steps of miRNA target identification with miRNA-target folding and GO annotation. Fifth, it provides helper functions for the update of pre-installed databases and automatic recovery. Finally, it supports multi-project and multi-thread management. C-mii constitutes the first complete software package with graphical user interfaces enabling computational identification of both plant miRNA genes and miRNA targets. With the provided functionalities, it can help accelerate the study of plant miRNAs and targets, especially for small and medium plant molecular labs without bioinformaticians. C-mii is freely available at http://www.biotec.or.th/isl/c-mii for both Windows and Ubuntu Linux platforms.
C-mii: a tool for plant miRNA and target identification
2012-01-01
Background MicroRNAs (miRNAs) have been known to play an important role in several biological processes in both animals and plants. Although several tools for miRNA and target identification are available, the number of tools tailored towards plants is limited, and those that are available have specific functionality, lack graphical user interfaces, and restrict the number of input sequences. Large-scale computational identifications of miRNAs and/or targets of several plants have been also reported. Their methods, however, are only described as flow diagrams, which require programming skills and the understanding of input and output of the connected programs to reproduce. Results To overcome these limitations and programming complexities, we proposed C-mii as a ready-made software package for both plant miRNA and target identification. C-mii was designed and implemented based on established computational steps and criteria derived from previous literature with the following distinguishing features. First, software is easy to install with all-in-one programs and packaged databases. Second, it comes with graphical user interfaces (GUIs) for ease of use. Users can identify plant miRNAs and targets via step-by-step execution, explore the detailed results from each step, filter the results according to proposed constraints in plant miRNA and target biogenesis, and export sequences and structures of interest. Third, it supplies bird's eye views of the identification results with infographics and grouping information. Fourth, in terms of functionality, it extends the standard computational steps of miRNA target identification with miRNA-target folding and GO annotation. Fifth, it provides helper functions for the update of pre-installed databases and automatic recovery. Finally, it supports multi-project and multi-thread management. Conclusions C-mii constitutes the first complete software package with graphical user interfaces enabling computational identification of both plant miRNA genes and miRNA targets. With the provided functionalities, it can help accelerate the study of plant miRNAs and targets, especially for small and medium plant molecular labs without bioinformaticians. C-mii is freely available at http://www.biotec.or.th/isl/c-mii for both Windows and Ubuntu Linux platforms. PMID:23281648
Code of Federal Regulations, 2010 CFR
2010-10-01
..., national, or international standards. (f) The reviewer shall analyze all Fault Tree Analyses (FTA), Failure... cited by the reviewer; (4) Identification of any documentation or information sought by the reviewer...) Identification of the hardware and software verification and validation procedures for the PTC system's safety...
USDA-ARS?s Scientific Manuscript database
The use of Fourier Transform-Infrared Spectroscopy (FT-IR) in conjunction with Artificial Neural Network software, NeuroDeveloper™ was examined for the rapid identification and classification of Listeria species and serotyping of Listeria monocytogenes. A spectral library was created for 245 strains...
Current algorithmic solutions for peptide-based proteomics data generation and identification.
Hoopmann, Michael R; Moritz, Robert L
2013-02-01
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics. Copyright © 2012 Elsevier Ltd. All rights reserved.
Strategies for the enrichment and identification of basic proteins in proteome projects.
Bae, Soo-Han; Harris, Andrew G; Hains, Peter G; Chen, Hong; Garfin, David E; Hazell, Stuart L; Paik, Young-Ki; Walsh, Bradley J; Cordwell, Stuart J
2003-05-01
Two-dimensional gel electrophoresis (2-DE) is currently the method of choice for separating complex mixtures of proteins for visual comparison in proteome analysis. This technology, however, is biased against certain classes of proteins including low abundance and hydrophobic proteins. Proteins with extremely alkaline isoelectric points (pI) are often very poorly represented using 2-DE technology, even when complex mixtures are separated using commercially available pH 6-11 or pH 7-10 immobilized pH gradients. The genome of the human gut pathogen, Helicobacter pylori, is dominated by genes encoding basic proteins, and is therefore a useful model for examining methodology suitable for separating such proteins. H. pylori proteins were separated on pH 6-11 and novel pH 9-12 immobilized pH gradients and 65 protein spots were subjected to matrix-assisted laser desorption/ionization-time of flight mass spectrometry, leading to the identification of 49 unique proteins. No proteins were characterized with a theoretical pI of greater than 10.23. A second approach to examine extremely alkaline proteins (pI > 9.0) utilized a prefractionation isoelectric focusing. Proteins were separated into two fractions using Gradiflow technology, and the extremely basic fraction subjected to both sodium dodecyl sulphate-polyacrylamide gel electrophoresis and liquid chromatography (LC) - tandem mass spectrometry post-tryptic digest, allowing the identification of 17 and 13 proteins, respectively. Gradiflow separations were highly specific for proteins with pI > 9.0, however, a single LC separation only allowed the identification of peptides from highly abundant proteins. These methods and those encompassing multiple LC 'dimensions' may be a useful complement to 2-DE for 'near-to-total' proteome coverage in the alkaline pH range.
Fernandez-Caldas, Enrique; Cases, Barbara; Tudela, Jose Ignacio; Fernandez, Eva Abel; Casanovas, Miguel; Subiza, Jose Luis
2012-01-01
Background Allergoids have been successfully used in the treatment of respiratory allergic diseases. They are modified allergen extracts that allow the administration of high allergen doses, due to their reduced IgE binding capacity.They maintain allergen-specific T-cell recognition. Since they are native allergen extracts that have been polymerized with glutaraldehyde, identification of the allergenic molecules requires more complicated methods. The aim of the study was to determine the qualitative composition of different polymerized extracts and investigate the presence of defined allergenic molecules using Mass spectrometry. Methods Proteomic analysis was carried out at the Proteomics Facility of the Hospital Nacional de Parapléjicos (Toledo, Spain). After reduction and alkylation, proteins were digested with trypsin and the resulting peptides were cleaned using C18 SpinTips Sample Prep Kit; peptides were separated on an Ultimate nano-LC system using a Monolithic C18 column in combination with a precolumn for salt removal. Fractionation of the peptides was performed with a Probot microfraction collector and MS and MS/MS analysis of offline spotted peptide samples were performed using the Applied Biosystems 4800 plus MALDI TOF/TOF Analyzer mass spectrometer. ProteinPilot Software V 2.0.1 and the Paragon algorithm were used for the identification of the proteins. Each MS/MS spectrum was searched against the SwissProt 2010_10 database, Uniprot-Viridiplantae database and Uniprot_Betula database. Results Analysis of the peptides revealed the presence of native allergens in the polymerized extracts: Der p 1, Der p 2, Der p 3, Der p 8 and Der p 11 in D. pteronyssinus; Bet v 2, Bet v 6, Bet v 7 and several Bet v 1 isoforms in B. verrucosa and Phl p 1, Phl p 3, Phl p 5, Phl p 11 and Phl p 12 in P. pratense allergoids. In all cases, potential allergenic proteins were also identified, including ubiquitin, actin, Eenolase, fructose-bisphosphate aldolase, luminal-binding protein (Heat shock protein 70), calmodulin, among others. Conclusions The characterization of the allergenic composition of allergoids is possible using MS/MS analysis. The analysis confirms the presence of native allergens in the allergoids. Mayor allergens are preserved during polymerization.
Miotto, Olivo; Heiny, AT; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir
2008-01-01
Background The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components. Results We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts. Conclusion By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes. PMID:18315849
ERIC Educational Resources Information Center
Kalender, Ilker
2012-01-01
catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…
Hierarchical Segmentation Enhances Diagnostic Imaging
NASA Technical Reports Server (NTRS)
2007-01-01
Bartron Medical Imaging LLC (BMI), of New Haven, Connecticut, gained a nonexclusive license from Goddard Space Flight Center to use the RHSEG software in medical imaging. To manage image data, BMI then licensed two pattern-matching software programs from NASA's Jet Propulsion Laboratory that were used in image analysis and three data-mining and edge-detection programs from Kennedy Space Center. More recently, BMI made NASA history by being the first company to partner with the Space Agency through a Cooperative Research and Development Agreement to develop a 3-D version of RHSEG. With U.S. Food and Drug Administration clearance, BMI will sell its Med-Seg imaging system with the 2-D version of the RHSEG software to analyze medical imagery from CAT and PET scans, MRI, ultrasound, digitized X-rays, digitized mammographies, dental X-rays, soft tissue analyses, moving object analyses, and soft-tissue slides such as Pap smears for the diagnoses and management of diseases. Extending the software's capabilities to three dimensions will eventually enable production of pixel-level views of a tumor or lesion, early identification of plaque build-up in arteries, and identification of density levels of microcalcification in mammographies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fields, C.A.
1996-06-01
The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Specification-based software sizing: An empirical investigation of function metrics
NASA Technical Reports Server (NTRS)
Jeffery, Ross; Stathis, John
1993-01-01
For some time the software industry has espoused the need for improved specification-based software size metrics. This paper reports on a study of nineteen recently developed systems in a variety of application domains. The systems were developed by a single software services corporation using a variety of languages. The study investigated several metric characteristics. It shows that: earlier research into inter-item correlation within the overall function count is partially supported; a priori function counts, in themself, do not explain the majority of the effort variation in software development in the organization studied; documentation quality is critical to accurate function identification; and rater error is substantial in manual function counting. The implication of these findings for organizations using function based metrics are explored.
Comprehensive Identification of Proteins from MALDI Imaging*
Maier, Stefan K.; Hahne, Hannes; Gholami, Amin Moghaddas; Balluff, Benjamin; Meding, Stephan; Schoene, Cédrik; Walch, Axel K.; Kuster, Bernhard
2013-01-01
Matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) is a powerful tool for the visualization of proteins in tissues and has demonstrated considerable diagnostic and prognostic value. One main challenge is that the molecular identity of such potential biomarkers mostly remains unknown. We introduce a generic method that removes this issue by systematically identifying the proteins embedded in the MALDI matrix using a combination of bottom-up and top-down proteomics. The analyses of ten human tissues lead to the identification of 1400 abundant and soluble proteins constituting the set of proteins detectable by MALDI IMS including >90% of all IMS biomarkers reported in the literature. Top-down analysis of the matrix proteome identified 124 mostly N- and C-terminally fragmented proteins indicating considerable protein processing activity in tissues. All protein identification data from this study as well as the IMS literature has been deposited into MaTisse, a new publically available database, which we anticipate will become a valuable resource for the IMS community. PMID:23782541
Hu, Zhengyan; Zhao, Liang; Zhang, Hongyan; Zhang, Yi; Wu, Ren'an; Zou, Hanfa
2014-03-21
Proteins interacting with nanoparticles would form the protein coronas on the surface of nanoparticles in biological systems, which would critically impact the biological identities of nanoparticles and/or result in the physiological and pathological consequences. The enzymatic digestion of protein corona was the primary step to achieve the identification of protein components of the protein corona for the bottom-up proteomic approaches. In this study, the investigation on the tryptic digestion of protein corona by the immobilized trypsin on a magnetic nanoparticle was carried out for the first time. As a comparison with the usual overnight long-time digestion and the severe self-digestion of free trypsin, the on-bead digestion of protein corona by the immobilized trypsin could be accomplished within 1h, along with the significantly reduced self-digestion of trypsin and the improved reproducibility on the identification of proteins by the mass spectrometry-based proteomic approach. It showed that the number of identified bovine serum (BS) proteins on the commercial Fe3O4 nanoparticles was increased by 13% for the immobilized trypsin with 1h digestion as compared to that of using free trypsin with even overnight digestion. In addition, the on-bead digestion of using the immobilized trypsin was further applied on the identification of human plasma protein corona on the commercial Fe3O4 nanoparticles, which leads the efficient digestion of the human plasma proteins and the identification of 149 human plasma proteins corresponding to putative critical pathways and biological processes. Copyright © 2014 Elsevier B.V. All rights reserved.
The Tetracorder user guide: version 4.4
Livo, Keith Eric; Clark, Roger N.
2014-01-01
Imaging spectroscopy mapping software assists in the identification and mapping of materials based on their chemical properties as expressed in spectral measurements of a planet including the solid or liquid surface or atmosphere. Such software can be used to analyze field, aircraft, or spacecraft data; remote sensing datasets; or laboratory spectra. Tetracorder is a set of software algorithms commanded through an expert system to identify materials based on their spectra (Clark and others, 2003). Tetracorder also can be used in traditional remote sensing analyses, because some of the algorithms are a version of a matched filter. Thus, depending on the instructions fed to the Tetracorder system, results can range from simple matched filter output, to spectral feature fitting, to full identification of surface materials (within the limits of the spectral signatures of materials over the spectral range and resolution of the imaging spectroscopy data). A basic understanding of spectroscopy by the user is required for developing an optimum mapping strategy and assessing the results.
Multi-Agent Diagnosis and Control of an Air Revitalization System for Life Support in Space
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Kowing, Jeffrey; Nieten, Joseph; Graham, Jeffrey s.; Schreckenghost, Debra; Bonasso, Pete; Fleming, Land D.; MacMahon, Matt; Thronesbery, Carroll
2000-01-01
An architecture of interoperating agents has been developed to provide control and fault management for advanced life support systems in space. In this adjustable autonomy architecture, software agents coordinate with human agents and provide support in novel fault management situations. This architecture combines the Livingstone model-based mode identification and reconfiguration (MIR) system with the 3T architecture for autonomous flexible command and control. The MIR software agent performs model-based state identification and diagnosis. MIR identifies novel recovery configurations and the set of commands required for the recovery. The AZT procedural executive and the human operator use the diagnoses and recovery recommendations, and provide command sequencing. User interface extensions have been developed to support human monitoring of both AZT and MIR data and activities. This architecture has been demonstrated performing control and fault management for an oxygen production system for air revitalization in space. The software operates in a dynamic simulation testbed.
Hanson-Smith, Victor; Johnson, Alexander
2016-07-01
The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and "resurrect" (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: http://www.phylobot.com. The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server.
Hanson-Smith, Victor; Johnson, Alexander
2016-01-01
The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and “resurrect” (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: http://www.phylobot.com. The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server. PMID:27472806
Braun, Martin; Kirsten, Robert; Rupp, Niels J; Moch, Holger; Fend, Falko; Wernert, Nicolas; Kristiansen, Glen; Perner, Sven
2013-05-01
Quantification of protein expression based on immunohistochemistry (IHC) is an important step for translational research and clinical routine. Several manual ('eyeballing') scoring systems are used in order to semi-quantify protein expression based on chromogenic intensities and distribution patterns. However, manual scoring systems are time-consuming and subject to significant intra- and interobserver variability. The aim of our study was to explore, whether new image analysis software proves to be sufficient as an alternative tool to quantify protein expression. For IHC experiments, one nucleus specific marker (i.e., ERG antibody), one cytoplasmic specific marker (i.e., SLC45A3 antibody), and one marker expressed in both compartments (i.e., TMPRSS2 antibody) were chosen. Stainings were applied on TMAs, containing tumor material of 630 prostate cancer patients. A pathologist visually quantified all IHC stainings in a blinded manner, applying a four-step scoring system. For digital quantification, image analysis software (Tissue Studio v.2.1, Definiens AG, Munich, Germany) was applied to obtain a continuous spectrum of average staining intensity. For each of the three antibodies we found a strong correlation of the manual protein expression score and the score of the image analysis software. Spearman's rank correlation coefficient was 0.94, 0.92, and 0.90 for ERG, SLC45A3, and TMPRSS2, respectively (p⟨0.01). Our data suggest that the image analysis software Tissue Studio is a powerful tool for quantification of protein expression in IHC stainings. Further, since the digital analysis is precise and reproducible, computer supported protein quantification might help to overcome intra- and interobserver variability and increase objectivity of IHC based protein assessment.
Hynek, Radovan; Kuckova, Stepanka; Hradilova, Janka; Kodicek, Milan
2004-01-01
Identification of materials in color layers of paintings is necessary for correct decisions concerning restoration procedures as well as proving the authenticity of the painting. The proteins are usually important components of the painting layers. In this paper it has been demonstrated that matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOFMS) can be used for fast and reliable identification of proteins in color layers even in old, highly aged matrices. The digestion can be easily performed directly on silica wafers which are routinely used for infrared analysis. The amount of material necessary for such an analysis is extremely small. Peptide mass mapping using digestion with trypsin followed by MALDI-TOFMS and identification of the protein was successfully used for determination of the binder from a painting of the 19th century. Copyright 2004 John Wiley & Sons, Ltd.
Enyaru, John C.; Carr, Steven A.; Pearson, Terry W.
2013-01-01
Control of human African sleeping sickness, caused by subspecies of the protozoan parasite Trypanosoma brucei, is based on preventing transmission by elimination of the tsetse vector and by active diagnostic screening and treatment of infected patients. To identify trypanosome proteins that have potential as biomarkers for detection and monitoring of African sleeping sickness, we have used a ‘deep-mining” proteomics approach to identify trypanosome proteins in human plasma. Abundant human plasma proteins were removed by immunodepletion. Depleted plasma samples were then digested to peptides with trypsin, fractionated by basic reversed phase and each fraction analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This sample processing and analysis method enabled identification of low levels of trypanosome proteins in pooled plasma from late stage sleeping sickness patients infected with Trypanosoma brucei rhodesiense. A total of 254 trypanosome proteins were confidently identified. Many of the parasite proteins identified were of unknown function, although metabolic enzymes, chaperones, proteases and ubiquitin-related/acting proteins were found. This approach to the identification of conserved, soluble trypanosome proteins in human plasma offers a possible route to improved disease diagnosis and monitoring, since these molecules are potential biomarkers for the development of a new generation of antigen-detection assays. The combined immuno-depletion/mass spectrometric approach can be applied to a variety of infectious diseases for unbiased biomarker identification. PMID:23951171
Chavez, Juan D.; Bisson, William H.
2011-01-01
The site-specific identification of α-aminoadipic semialdehyde (AAS) and γ-glutamic semialdehyde (GGS) residues in proteins is reported. Semialdehydic protein modifications result from the metal-catalyzed oxidation of Lys or Arg and Pro residues, respectively. Most of the analytical methods for the analysis of protein carbonylation measure change to the global level of carbonylation and fail to provide details regarding protein identity, site, and chemical nature of the carbonylation. In this work, we used a targeted approach, which combines chemical labeling, enrichment, and tandem mass spectrometric analysis, for the site-specific identification of AAS and GGS sites in proteins. The approach is applied to in vitro oxidized glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and an untreated biological sample, namely cardiac mitochondrial proteins. The analysis of GAPDH resulted in the site-specific identification of two AAA and four GGS residues. Computational evaluation of the identified AAS and GGS sites in GAPDH indicated that these sites are located in flexible regions, show high solvent accessibility values, and are in proximity with possible metal ion binding sites. The targeted proteomic analysis of semialdehydic modifications in cardiac mitochondria yielded nine AAS modification sites which were unambiguously assigned to distinct lysine residues in the following proteins: ATP/ATP translocase isoforms 1 and 2, ubiquinol cytochrome-c reductase core protein 2, and ATP synthase α-subunit. PMID:20957471
Protein social behavior makes a stronger signal for partner identification than surface geometry.
Laine, Elodie; Carbone, Alessandra
2017-01-01
Cells are interactive living systems where proteins movements, interactions and regulation are substantially free from centralized management. How protein physico-chemical and geometrical properties determine who interact with whom remains far from fully understood. We show that characterizing how a protein behaves with many potential interactors in a complete cross-docking study leads to a sharp identification of its cellular/true/native partner(s). We define a sociability index, or S-index, reflecting whether a protein likes or not to pair with other proteins. Formally, we propose a suitable normalization function that accounts for protein sociability and we combine it with a simple interface-based (ranking) score to discriminate partners from non-interactors. We show that sociability is an important factor and that the normalization permits to reach a much higher discriminative power than shape complementarity docking scores. The social effect is also observed with more sophisticated docking algorithms. Docking conformations are evaluated using experimental binding sites. These latter approximate in the best possible way binding sites predictions, which have reached high accuracy in recent years. This makes our analysis helpful for a global understanding of partner identification and for suggesting discriminating strategies. These results contradict previous findings claiming the partner identification problem being solvable solely with geometrical docking. Proteins 2016; 85:137-154. © 2016 Wiley Periodicals, Inc. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Eyford, Brett A; Ahmad, Rushdy; Enyaru, John C; Carr, Steven A; Pearson, Terry W
2013-01-01
Control of human African sleeping sickness, caused by subspecies of the protozoan parasite Trypanosoma brucei, is based on preventing transmission by elimination of the tsetse vector and by active diagnostic screening and treatment of infected patients. To identify trypanosome proteins that have potential as biomarkers for detection and monitoring of African sleeping sickness, we have used a 'deep-mining" proteomics approach to identify trypanosome proteins in human plasma. Abundant human plasma proteins were removed by immunodepletion. Depleted plasma samples were then digested to peptides with trypsin, fractionated by basic reversed phase and each fraction analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This sample processing and analysis method enabled identification of low levels of trypanosome proteins in pooled plasma from late stage sleeping sickness patients infected with Trypanosoma brucei rhodesiense. A total of 254 trypanosome proteins were confidently identified. Many of the parasite proteins identified were of unknown function, although metabolic enzymes, chaperones, proteases and ubiquitin-related/acting proteins were found. This approach to the identification of conserved, soluble trypanosome proteins in human plasma offers a possible route to improved disease diagnosis and monitoring, since these molecules are potential biomarkers for the development of a new generation of antigen-detection assays. The combined immuno-depletion/mass spectrometric approach can be applied to a variety of infectious diseases for unbiased biomarker identification.
Kel, Ivan; Chang, Zisong; Galluccio, Nadia; Romeo, Margherita; Beretta, Stefano; Diomede, Luisa; Mezzelani, Alessandra; Milanesi, Luciano; Dieterich, Christoph; Merelli, Ivan
2016-10-18
The interpretation of genome-wide association study is difficult, as it is hard to understand how polymorphisms can affect gene regulation, in particular for trans-regulatory elements located far from their controlling gene. Using RNA or protein expression data as phenotypes, it is possible to correlate their variations with specific genotypes. This technique is usually referred to as expression Quantitative Trait Loci (eQTLs) analysis and only few packages exist for the integration of genotype patterns and expression profiles. In particular, tools are needed for the analysis of next-generation sequencing (NGS) data on a genome-wide scale, which is essential to identify eQTLs able to control a large number of genes (hotspots). Here we present SPIRE (Software for Polymorphism Identification Regulating Expression), a generic, modular and functionally highly flexible pipeline for eQTL processing. SPIRE integrates different univariate and multivariate approaches for eQTL analysis, paying particular attention to the scalability of the procedure in order to support cis- as well as trans-mapping, thus allowing the identification of hotspots in NGS data. In particular, we demonstrated how SPIRE can handle big association study datasets, reproducing published results and improving the identification of trans-eQTLs. Furthermore, we employed the pipeline to analyse novel data concerning the genotypes of two different C. elegans strains (N2 and Hawaii) and related miRNA expression data, obtained using RNA-Seq. A miRNA regulatory hotspot was identified in chromosome 1, overlapping the transcription factor grh-1, known to be involved in the early phases of embryonic development of C. elegans. In a follow-up qPCR experiment we were able to verify most of the predicted eQTLs, as well as to show, for a novel miRNA, a significant difference in the sequences of the two analysed strains of C. elegans. SPIRE is publicly available as open source software at , together with some example data, a readme file, supplementary material and a short tutorial.
Identification of Protein-Protein Interactions with Glutathione-S-Transferase (GST) Fusion Proteins.
Einarson, Margret B; Pugacheva, Elena N; Orlinick, Jason R
2007-08-01
INTRODUCTIONGlutathione-S-transferase (GST) fusion proteins have had a wide range of applications since their introduction as tools for synthesis of recombinant proteins in bacteria. GST was originally selected as a fusion moiety because of several desirable properties. First and foremost, when expressed in bacteria alone, or as a fusion, GST is not sequestered in inclusion bodies (in contrast to previous fusion protein systems). Second, GST can be affinity-purified without denaturation because it binds to immobilized glutathione, which provides the basis for simple purification. Consequently, GST fusion proteins are routinely used for antibody generation and purification, protein-protein interaction studies, and biochemical analysis. This article describes the use of GST fusion proteins as probes for the identification of protein-protein interactions.
[Evaluation of mass spectrometry: MALDI-TOF MS for fast and reliable yeast identification].
Relloso, María S; Nievas, Jimena; Fares Taie, Santiago; Farquharson, Victoria; Mujica, María T; Romano, Vanesa; Zarate, Mariela S; Smayevsky, Jorgelina
2015-01-01
The matrix-assisted laser desorption/ionization time-of-flight mass spectrometry technique known as MALDI-TOF MS is a tool used for the identification of clinical pathogens by generating a protein spectrum that is unique for a given species. In this study we assessed the identification of clinical yeast isolates by MALDI-TOF MS in a university hospital from Argentina and compared two procedures for protein extraction: a rapid method and a procedure based on the manufacturer's recommendations. A short protein extraction procedure was applied in 100 isolates and the rate of correct identification at genus and species level was 98.0%. In addition, we analyzed 201 isolates, previously identified by conventional methods, using the methodology recommended by the manufacturer and there was 95.38% coincidence in the identification at species level. MALDI TOF MS showed to be a fast, simple and reliable tool for yeast identification. Copyright © 2014 Asociación Argentina de Microbiología. Publicado por Elsevier España, S.L.U. All rights reserved.
Calculation of the relative metastabilities of proteins using the CHNOSZ software package
Dick, Jeffrey M
2008-01-01
Background Proteins of various compositions are required by organisms inhabiting different environments. The energetic demands for protein formation are a function of the compositions of proteins as well as geochemical variables including temperature, pressure, oxygen fugacity and pH. The purpose of this study was to explore the dependence of metastable equilibrium states of protein systems on changes in the geochemical variables. Results A software package called CHNOSZ implementing the revised Helgeson-Kirkham-Flowers (HKF) equations of state and group additivity for ionized unfolded aqueous proteins was developed. The program can be used to calculate standard molal Gibbs energies and other thermodynamic properties of reactions and to make chemical speciation and predominance diagrams that represent the metastable equilibrium distributions of proteins. The approach takes account of the chemical affinities of reactions in open systems characterized by the chemical potentials of basis species. The thermodynamic database included with the package permits application of the software to mineral and other inorganic systems as well as systems of proteins or other biomolecules. Conclusion Metastable equilibrium activity diagrams were generated for model cell-surface proteins from archaea and bacteria adapted to growth in environments that differ in temperature and chemical conditions. The predicted metastable equilibrium distributions of the proteins can be compared with the optimal growth temperatures of the organisms and with geochemical variables. The results suggest that a thermodynamic assessment of protein metastability may be useful for integrating bio- and geochemical observations. PMID:18834534
Yates, John R
2015-11-01
Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-10-26
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m / z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation.
Théron, Laëtitia; Centeno, Delphine; Coudy-Gandilhon, Cécile; Pujos-Guillot, Estelle; Astruc, Thierry; Rémond, Didier; Barthelemy, Jean-Claude; Roche, Frédéric; Feasson, Léonard; Hébraud, Michel; Béchet, Daniel; Chambon, Christophe
2016-01-01
Mass spectrometry imaging (MSI) is a powerful tool to visualize the spatial distribution of molecules on a tissue section. The main limitation of MALDI-MSI of proteins is the lack of direct identification. Therefore, this study focuses on a MSI~LC-MS/MS-LF workflow to link the results from MALDI-MSI with potential peak identification and label-free quantitation, using only one tissue section. At first, we studied the impact of matrix deposition and laser ablation on protein extraction from the tissue section. Then, we did a back-correlation of the m/z of the proteins detected by MALDI-MSI to those identified by label-free quantitation. This allowed us to compare the label-free quantitation of proteins obtained in LC-MS/MS with the peak intensities observed in MALDI-MSI. We managed to link identification to nine peaks observed by MALDI-MSI. The results showed that the MSI~LC-MS/MS-LF workflow (i) allowed us to study a representative muscle proteome compared to a classical bottom-up workflow; and (ii) was sparsely impacted by matrix deposition and laser ablation. This workflow, performed as a proof-of-concept, suggests that a single tissue section can be used to perform MALDI-MSI and protein extraction, identification, and relative quantitation. PMID:28248242
Using PATIMDB to Create Bacterial Transposon Insertion Mutant Libraries
Urbach, Jonathan M.; Wei, Tao; Liberati, Nicole; Grenfell-Lee, Daniel; Villanueva, Jacinto; Wu, Gang; Ausubel, Frederick M.
2015-01-01
PATIMDB is a software package for facilitating the generation of transposon mutant insertion libraries. The software has two main functions: process tracking and automated sequence analysis. The process tracking function specifically includes recording the status and fates of multiwell plates and samples in various stages of library construction. Automated sequence analysis refers specifically to the pipeline of sequence analysis starting with ABI files from a sequencing facility and ending with insertion location identifications. The protocols in this unit describe installation and use of PATIMDB software. PMID:19343706
NASA Technical Reports Server (NTRS)
1990-01-01
The present conference on digital avionics discusses vehicle-management systems, spacecraft avionics, special vehicle avionics, communication/navigation/identification systems, software qualification and quality assurance, launch-vehicle avionics, Ada applications, sensor and signal processing, general aviation avionics, automated software development, design-for-testability techniques, and avionics-software engineering. Also discussed are optical technology and systems, modular avionics, fault-tolerant avionics, commercial avionics, space systems, data buses, crew-station technology, embedded processors and operating systems, AI and expert systems, data links, and pilot/vehicle interfaces.
[Measurement of intracranial hematoma volume by personal computer].
DU, Wanping; Tan, Lihua; Zhai, Ning; Zhou, Shunke; Wang, Rui; Xue, Gongshi; Xiao, An
2011-01-01
To explore the method for intracranial hematoma volume measurement by the personal computer. Forty cases of various intracranial hematomas were measured by the computer tomography with quantitative software and personal computer with Photoshop CS3 software, respectively. the data from the 2 methods were analyzed and compared. There was no difference between the data from the computer tomography and the personal computer (P>0.05). The personal computer with Photoshop CS3 software can measure the volume of various intracranial hematomas precisely, rapidly and simply. It should be recommended in the clinical medicolegal identification.
Direct Maximization of Protein Identifications from Tandem Mass Spectra*
Spivak, Marina; Weston, Jason; Tomazela, Daniela; MacCoss, Michael J.; Noble, William Stafford
2012-01-01
The goal of many shotgun proteomics experiments is to determine the protein complement of a complex biological mixture. For many mixtures, most methodological approaches fall significantly short of this goal. Existing solutions to this problem typically subdivide the task into two stages: first identifying a collection of peptides with a low false discovery rate and then inferring from the peptides a corresponding set of proteins. In contrast, we formulate the protein identification problem as a single optimization problem, which we solve using machine learning methods. This approach is motivated by the observation that the peptide and protein level tasks are cooperative, and the solution to each can be improved by using information about the solution to the other. The resulting algorithm directly controls the relevant error rate, can incorporate a wide variety of evidence and, for complex samples, provides 18–34% more protein identifications than the current state of the art approaches. PMID:22052992
Astrometrica: Astrometric data reduction of CCD images
NASA Astrophysics Data System (ADS)
Raab, Herbert
2012-03-01
Astrometrica is an interactive software tool for scientific grade astrometric data reduction of CCD images. The current version of the software is for the Windows 32bit operating system family. Astrometrica reads FITS (8, 16 and 32 bit integer files) and SBIG image files. The size of the images is limited only by available memory. It also offers automatic image calibration (Dark Frame and Flat Field correction), automatic reference star identification, automatic moving object detection and identification, and access to new-generation star catalogs (PPMXL, UCAC 3 and CMC-14), in addition to online help and other features. Astrometrica is shareware, available for use for a limited period of time (100 days) for free; special arrangements can be made for educational projects.
Achievements and Challenges in Computational Protein Design.
Samish, Ilan
2017-01-01
Computational protein design (CPD), a yet evolving field, includes computer-aided engineering for partial or full de novo designs of proteins of interest. Designs are defined by a requested structure, function, or working environment. This chapter describes the birth and maturation of the field by presenting 101 CPD examples in a chronological order emphasizing achievements and pending challenges. Integrating these aspects presents the plethora of CPD approaches with the hope of providing a "CPD 101". These reflect on the broader structural bioinformatics and computational biophysics field and include: (1) integration of knowledge-based and energy-based methods, (2) hierarchical designated approach towards local, regional, and global motifs and the integration of high- and low-resolution design schemes that fit each such region, (3) systematic differential approaches towards different protein regions, (4) identification of key hot-spot residues and the relative effect of remote regions, (5) assessment of shape-complementarity, electrostatics and solvation effects, (6) integration of thermal plasticity and functional dynamics, (7) negative design, (8) systematic integration of experimental approaches, (9) objective cross-assessment of methods, and (10) successful ranking of potential designs. Future challenges also include dissemination of CPD software to the general use of life-sciences researchers and the emphasis of success within an in vivo milieu. CPD increases our understanding of protein structure and function and the relationships between the two along with the application of such know-how for the benefit of mankind. Applied aspects range from biological drugs, via healthier and tastier food products to nanotechnology and environmentally friendly enzymes replacing toxic chemicals utilized in the industry.
Visual identification system for homeland security and law enforcement support
NASA Astrophysics Data System (ADS)
Samuel, Todd J.; Edwards, Don; Knopf, Michael
2005-05-01
This paper describes the basic configuration for a visual identification system (VIS) for Homeland Security and law enforcement support. Security and law enforcement systems with an integrated VIS will accurately and rapidly provide identification of vehicles or containers that have entered, exited or passed through a specific monitoring location. The VIS system stores all images and makes them available for recall for approximately one week. Images of alarming vehicles will be archived indefinitely as part of the alarming vehicle"s or cargo container"s record. Depending on user needs, the digital imaging information will be provided electronically to the individual inspectors, supervisors, and/or control center at the customer"s office. The key components of the VIS are the high-resolution cameras that capture images of vehicles, lights, presence sensors, image cataloging software, and image recognition software. In addition to the cameras, the physical integration and network communications of the VIS components with the balance of the security system and client must be ensured.
Zhou, Zhiwei; Xiong, Xin; Zhu, Zheng-Jiang
2017-07-15
In metabolomics, rigorous structural identification of metabolites presents a challenge for bioinformatics. The use of collision cross-section (CCS) values of metabolites derived from ion mobility-mass spectrometry effectively increases the confidence of metabolite identification, but this technique suffers from the limit number of available CCS values. Currently, there is no software available for rapidly generating the metabolites' CCS values. Here, we developed the first web server, namely, MetCCS Predictor, for predicting CCS values. It can predict the CCS values of metabolites using molecular descriptors within a few seconds. Common users with limited background on bioinformatics can benefit from this software and effectively improve the metabolite identification in metabolomics. The web server is freely available at: http://www.metabolomics-shanghai.org/MetCCS/ . jiangzhu@sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.
2016-01-01
Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contributes to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), that enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the following iterations. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. PMID:26419769
Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W; Moritz, Robert L
2015-11-01
Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. Graphical Abstract ᅟ.
NASA Astrophysics Data System (ADS)
Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.
2015-11-01
Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website.
Design and analysis of quantitative differential proteomics investigations using LC-MS technology.
Bukhman, Yury V; Dharsee, Moyez; Ewing, Rob; Chu, Peter; Topaloglou, Thodoros; Le Bihan, Thierry; Goh, Theo; Duewel, Henry; Stewart, Ian I; Wisniewski, Jacek R; Ng, Nancy F
2008-02-01
Liquid chromatography-mass spectrometry (LC-MS)-based proteomics is becoming an increasingly important tool in characterizing the abundance of proteins in biological samples of various types and across conditions. Effects of disease or drug treatments on protein abundance are of particular interest for the characterization of biological processes and the identification of biomarkers. Although state-of-the-art instrumentation is available to make high-quality measurements and commercially available software is available to process the data, the complexity of the technology and data presents challenges for bioinformaticians and statisticians. Here, we describe a pipeline for the analysis of quantitative LC-MS data. Key components of this pipeline include experimental design (sample pooling, blocking, and randomization) as well as deconvolution and alignment of mass chromatograms to generate a matrix of molecular abundance profiles. An important challenge in LC-MS-based quantitation is to be able to accurately identify and assign abundance measurements to members of protein families. To address this issue, we implement a novel statistical method for inferring the relative abundance of related members of protein families from tryptic peptide intensities. This pipeline has been used to analyze quantitative LC-MS data from multiple biomarker discovery projects. We illustrate our pipeline here with examples from two of these studies, and show that the pipeline constitutes a complete workable framework for LC-MS-based differential quantitation. Supplementary material is available at http://iec01.mie.utoronto.ca/~thodoros/Bukhman/.
Zhang, Huaizhong; Fan, Jun; Perkins, Simon; Pisconti, Addolorata; Simpson, Deborah M.; Bessant, Conrad; Hubbard, Simon; Jones, Andrew R.
2015-01-01
The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML‐based format, capable of representing data about two‐dimensional features from LC‐MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java‐based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)‐level quantification values from peptide‐level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC‐MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in‐built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq‐lib/. PMID:26037908
Qi, Da; Zhang, Huaizhong; Fan, Jun; Perkins, Simon; Pisconti, Addolorata; Simpson, Deborah M; Bessant, Conrad; Hubbard, Simon; Jones, Andrew R
2015-09-01
The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML-based format, capable of representing data about two-dimensional features from LC-MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java-based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)-level quantification values from peptide-level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC-MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in-built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq-lib/. © 2015 The Authors. PROTEOMICS Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Target identification by image analysis.
Fetz, V; Prochnow, H; Brönstrup, M; Sasse, F
2016-05-04
Covering: 1997 to the end of 2015Each biologically active compound induces phenotypic changes in target cells that are characteristic for its mode of action. These phenotypic alterations can be directly observed under the microscope or made visible by labelling structural elements or selected proteins of the cells with dyes. A comparison of the cellular phenotype induced by a compound of interest with the phenotypes of reference compounds with known cellular targets allows predicting its mode of action. While this approach has been successfully applied to the characterization of natural products based on a visual inspection of images, recent studies used automated microscopy and analysis software to increase speed and to reduce subjective interpretation. In this review, we give a general outline of the workflow for manual and automated image analysis, and we highlight natural products whose bacterial and eucaryotic targets could be identified through such approaches.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
Faure, Guilhem; Revy, Patrick; Schertzer, Michael; Londono-Vallejo, Arturo; Callebaut, Isabelle
2014-06-01
Several studies have recently shown that germline mutations in RTEL1, an essential DNA helicase involved in telomere regulation and DNA repair, cause Hoyeraal-Hreidarsson syndrome (HHS), a severe form of dyskeratosis congenita. Using original new softwares, facilitating the delineation of the different domains of the protein and the identification of remote relationships for orphan domains, we outline here that the C-terminal extension of RTEL1, downstream of its catalytic domain and including several HHS-associated mutations, contains a yet unidentified tandem of harmonin-N-like domains, which may serve as a hub for partner interaction. This finding highlights the potential critical role of this region for the function of RTEL1 and gives insights into the impact that the identified mutations would have on the structure and function of these domains. © 2013 Wiley Periodicals, Inc.
PrAS: Prediction of amidation sites using multiple feature extraction.
Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao
2017-02-01
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bhatia, Vivek N.; Perlman, David H.; Costello, Catherine E.; McComb, Mark E.
2009-01-01
In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP) – a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP includes meta-information on the protein in addition to complimentary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger dataset. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie (biological process, cellular component and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large datasets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net. PMID:19839595
Chang, J; Kim, Y; Kwon, H J
2016-05-04
Covering: up to February 2016Identification of the target proteins of natural products is pivotal to understanding the mechanisms of action to develop natural products for use as molecular probes and potential therapeutic drugs. Affinity chromatography of immobilized natural products has been conventionally used to identify target proteins, and has yielded good results. However, this method has limitations, in that labeling or tagging for immobilization and affinity purification often result in reduced or altered activity of the natural product. New strategies have recently been developed and applied to identify the target proteins of natural products and synthetic small molecules without chemical modification of the natural product. These direct and indirect methods for target identification of label-free natural products include drug affinity responsive target stability (DARTS), stability of proteins from rates of oxidation (SPROX), cellular thermal shift assay (CETSA), thermal proteome profiling (TPP), and bioinformatics-based analysis of connectivity. This review focuses on and reports case studies of the latest advances in target protein identification methods for label-free natural products. The integration of newly developed technologies will provide new insights and highlight the value of natural products for use as biological probes and new drug candidates.
Recombinant blood group proteins for use in antibody screening and identification tests.
Seltsam, Axel; Blasczyk, Rainer
2009-11-01
The present review elucidates the potentials of recombinant blood group proteins (BGPs) for red blood cell (RBC) antibody detection and identification in pretransfusion testing and the achievements in this field so far. Many BGPs have been eukaryotically and prokaryotically expressed in sufficient quantity and quality for RBC antibody testing. Recombinant BGPs can be incorporated in soluble protein reagents or solid-phase assays such as ELISA, color-coded microsphere and protein microarray chip-based techniques. Because novel recombinant protein-based assays use single antigens, a positive reaction of a serum with the recombinant protein directly indicates the presence and specificity of the target antibody. Inversely, conventional RBC-based assays use panels of human RBCs carrying a huge number of blood group antigens at the same time and require negative reactions of samples with antigen-negative cells for indirect determination of antibody specificity. Because of their capacity for single-step, direct RBC antibody determination, recombinant protein-based assays may greatly facilitate and accelerate the identification of common and rare RBC antibodies.
Luzio, J P; Brake, B; Banting, G; Howell, K E; Braghetta, P; Stanley, K K
1990-01-01
Organelle-specific integral membrane proteins were identified by a novel strategy which gives rise to monospecific antibodies to these proteins as well as to the cDNA clones encoding them. A cDNA expression library was screened with a polyclonal antiserum raised against Triton X-114-extracted organelle proteins and clones were then grouped using antibodies affinity-purified on individual fusion proteins. The identification, molecular cloning and sequencing are described of a type 1 membrane protein (TGN38) which is located specifically in the trans-Golgi network. Images Fig. 1. Fig. 3. PMID:2204342
NASA Astrophysics Data System (ADS)
Setou, M.; Hayasaka, T.; Shimma, S.; Sugiura, Y.; Matsumoto, M.
2008-12-01
Molecular identification using high-sensitivity tandem mass spectrometry is essential for protein analysis on the tissue surface. Here we report an improved digestion protocol for protein identification directly on the tissue surface using mass spectrometry. By denaturation process and the use of detergent-supplemented trypsin solution, we could successfully detect and identify many molecules such as tubulin, neurofilament, and synaptosomal-associated 25 kDa protein directly from a mouse cerebellum section.
Protein social behavior makes a stronger signal for partner identification than surface geometry
Laine, Elodie
2016-01-01
ABSTRACT Cells are interactive living systems where proteins movements, interactions and regulation are substantially free from centralized management. How protein physico‐chemical and geometrical properties determine who interact with whom remains far from fully understood. We show that characterizing how a protein behaves with many potential interactors in a complete cross‐docking study leads to a sharp identification of its cellular/true/native partner(s). We define a sociability index, or S‐index, reflecting whether a protein likes or not to pair with other proteins. Formally, we propose a suitable normalization function that accounts for protein sociability and we combine it with a simple interface‐based (ranking) score to discriminate partners from non‐interactors. We show that sociability is an important factor and that the normalization permits to reach a much higher discriminative power than shape complementarity docking scores. The social effect is also observed with more sophisticated docking algorithms. Docking conformations are evaluated using experimental binding sites. These latter approximate in the best possible way binding sites predictions, which have reached high accuracy in recent years. This makes our analysis helpful for a global understanding of partner identification and for suggesting discriminating strategies. These results contradict previous findings claiming the partner identification problem being solvable solely with geometrical docking. Proteins 2016; 85:137–154. © 2016 Wiley Periodicals, Inc. PMID:27802579
Wang, Jigang; Zhang, Chong-Jing; Zhang, Jianbin; He, Yingke; Lee, Yew Mun; Chen, Songbi; Lim, Teck Kwang; Ng, Shukie; Shen, Han-Ming; Lin, Qingsong
2015-01-01
Target-identification and understanding of mechanism-of-action (MOA) are challenging for development of small-molecule probes and their application in biology and drug discovery. For example, although aspirin has been widely used for more than 100 years, its molecular targets have not been fully characterized. To cope with this challenge, we developed a novel technique called quantitative acid-cleavable activity-based protein profiling (QA-ABPP) with combination of the following two parts: (i) activity-based protein profiling (ABPP) and iTRAQ™ quantitative proteomics for identification of target proteins and (ii) acid-cleavable linker-based ABPP for identification of peptides with specific binding sites. It is known that reaction of aspirin with its target proteins leads to acetylation. We thus applied the above technique using aspirin-based probes in human cancer HCT116 cells. We identified 1110 target proteins and 2775 peptides with exact acetylation sites. By correlating these two sets of data, 523 proteins were identified as targets of aspirin. We used various biological assays to validate the effects of aspirin on inhibition of protein synthesis and induction of autophagy which were elicited from the pathway analysis of Aspirin target profile. This technique is widely applicable for target identification in the field of drug discovery and biology, especially for the covalent drugs. PMID:25600173
Isolation and identification of peanut leaf proteins regulated by water stress.
Akkasaeng, Chutipong; Tantisuwichwong, Napaporn; Chairam, Issariya; Prakrongrak, Narumon; Jogloy, Sanun; Pathanothai, Aran
2007-05-15
Water deficits trigger signaling cascades leading to modulation of protein expression in plant tissues. Identification of peanut leaf proteins regulated by water stress provides some insights of cellular and molecular response of peanut plants to drought stress. Peanut variety Khon Kaen 4, a water-stress sensitive variety, was grown in a growth chamber under controlled environment. Water stress was imposed on day 30 after seedling emergence by withholding watering peanut plants for 6 days as compared to plants adequately supplied with water. Total protein were prepared from a leaflet of fully expanded leaf on the main stem. Proteins were separated in duplicated gels using two-dimensional gel electrophoresis and visualized by silver nitrate staining. Image analysis was performed using ImageMaster 2D Platinum 5.0 to determine proteins regulated by water stress. Molecular mass and isoelectric point of each regulated protein were used in database queries for protein identification. One protein was induced under water stress and the homologous protein was identified as Serine/threonine-protein phosphatase PP 1. Five proteins were down-regulated by water deficit. The homologous proteins were chaperone protein DNAJ, auxin-responsive protein IAA29, peroxidase 43, caffeoyl-CoA O-methyltransferase and SNF1-related protein kinase regulatory subunit beta-2. Down-regulated proteins may be associated with sensitivity of the peanut variety to water stress.
Why are they missing? : Bioinformatics characterization of missing human proteins.
Elguoshy, Amr; Magdeldin, Sameh; Xu, Bo; Hirao, Yoshitoshi; Zhang, Ying; Kinoshita, Naohiko; Takisawa, Yusuke; Nameta, Masaaki; Yamamoto, Keiko; El-Refy, Ali; El-Fiky, Fawzy; Yamamoto, Tadashi
2016-10-21
NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification. Copyright © 2016. Published by Elsevier B.V.
MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data
Hartler, Jürgen; Thallinger, Gerhard G; Stocker, Gernot; Sturn, Alexander; Burkard, Thomas R; Körner, Erik; Rader, Robert; Schmidt, Andreas; Mechtler, Karl; Trajanoski, Zlatko
2007-01-01
Background The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches. Results We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at Conclusion Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community. PMID:17567892
SSMART: Sequence-structure motif identification for RNA-binding proteins.
Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe
2018-06-11
RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Storch, Tatiane Timm; Pegoraro, Camila; Finatto, Taciane; Quecini, Vera; Rombaldi, Cesar Valmor; Girardi, César Luis
2015-01-01
Reverse Transcription quantitative PCR (RT-qPCR) is one of the most important techniques for gene expression profiling due to its high sensibility and reproducibility. However, the reliability of the results is highly dependent on data normalization, performed by comparisons between the expression profiles of the genes of interest against those of constitutively expressed, reference genes. Although the technique is widely used in fruit postharvest experiments, the transcription stability of reference genes has not been thoroughly investigated under these experimental conditions. Thus, we have determined the transcriptional profile, under these conditions, of three genes commonly used as reference—ACTIN (MdACT), PROTEIN DISULPHIDE ISOMERASE (MdPDI) and UBIQUITIN-CONJUGATING ENZYME E2 (MdUBC)—along with two novel candidates—HISTONE 1 (MdH1) and NUCLEOSSOME ASSEMBLY 1 PROTEIN (MdNAP1). The expression profile of the genes was investigated throughout five experiments, with three of them encompassing the postharvest period and the other two, consisting of developmental and spatial phases. The transcriptional stability was comparatively investigated using four distinct software packages: BestKeeper, NormFinder, geNorm and DataAssist. Gene ranking results for transcriptional stability were similar for the investigated software packages, with the exception of BestKeeper. The classic reference gene MdUBC ranked among the most stably transcribed in all investigated experimental conditions. Transcript accumulation profiles for the novel reference candidate gene MdH1 were stable throughout the tested conditions, especially in experiments encompassing the postharvest period. Thus, our results present a novel reference gene for postharvest experiments in apple and reinforce the importance of checking the transcription profile of reference genes under the experimental conditions of interest. PMID:25774904
Storch, Tatiane Timm; Pegoraro, Camila; Finatto, Taciane; Quecini, Vera; Rombaldi, Cesar Valmor; Girardi, César Luis
2015-01-01
Reverse Transcription quantitative PCR (RT-qPCR) is one of the most important techniques for gene expression profiling due to its high sensibility and reproducibility. However, the reliability of the results is highly dependent on data normalization, performed by comparisons between the expression profiles of the genes of interest against those of constitutively expressed, reference genes. Although the technique is widely used in fruit postharvest experiments, the transcription stability of reference genes has not been thoroughly investigated under these experimental conditions. Thus, we have determined the transcriptional profile, under these conditions, of three genes commonly used as reference--ACTIN (MdACT), PROTEIN DISULPHIDE ISOMERASE (MdPDI) and UBIQUITIN-CONJUGATING ENZYME E2 (MdUBC)--along with two novel candidates--HISTONE 1 (MdH1) and NUCLEOSSOME ASSEMBLY 1 PROTEIN (MdNAP1). The expression profile of the genes was investigated throughout five experiments, with three of them encompassing the postharvest period and the other two, consisting of developmental and spatial phases. The transcriptional stability was comparatively investigated using four distinct software packages: BestKeeper, NormFinder, geNorm and DataAssist. Gene ranking results for transcriptional stability were similar for the investigated software packages, with the exception of BestKeeper. The classic reference gene MdUBC ranked among the most stably transcribed in all investigated experimental conditions. Transcript accumulation profiles for the novel reference candidate gene MdH1 were stable throughout the tested conditions, especially in experiments encompassing the postharvest period. Thus, our results present a novel reference gene for postharvest experiments in apple and reinforce the importance of checking the transcription profile of reference genes under the experimental conditions of interest.
Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey
2003-11-01
Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Comparison of Depletion Strategies for the Enrichment of Low-Abundance Proteins in Urine.
Filip, Szymon; Vougas, Konstantinos; Zoidakis, Jerome; Latosinska, Agnieszka; Mullen, William; Spasovski, Goce; Mischak, Harald; Vlahou, Antonia; Jankowski, Joachim
2015-01-01
Proteome analysis of complex biological samples for biomarker identification remains challenging, among others due to the extended range of protein concentrations. High-abundance proteins like albumin or IgG of plasma and urine, may interfere with the detection of potential disease biomarkers. Currently, several options are available for the depletion of abundant proteins in plasma. However, the applicability of these methods in urine has not been thoroughly investigated. In this study, we compared different, commercially available immunodepletion and ion-exchange based approaches on urine samples from both healthy subjects and CKD patients, for their reproducibility and efficiency in protein depletion. A starting urine volume of 500 μL was used to simulate conditions of a multi-institutional biomarker discovery study. All depletion approaches showed satisfactory reproducibility (n=5) in protein identification as well as protein abundance. Comparison of the depletion efficiency between the unfractionated and fractionated samples and the different depletion strategies, showed efficient depletion in all cases, with the exception of the ion-exchange kit. The depletion efficiency was found slightly higher in normal than in CKD samples and normal samples yielded more protein identifications than CKD samples when using both initial as well as corresponding depleted fractions. Along these lines, decrease in the amount of albumin and other targets as applicable, following depletion, was observed. Nevertheless, these depletion strategies did not yield a higher number of identifications in neither the urine from normal nor CKD patients. Collectively, when analyzing urine in the context of CKD biomarker identification, no added value of depletion strategies can be observed and analysis of unfractionated starting urine appears to be preferable.
Comparison of Depletion Strategies for the Enrichment of Low-Abundance Proteins in Urine
Filip, Szymon; Vougas, Konstantinos; Zoidakis, Jerome; Latosinska, Agnieszka; Mullen, William; Spasovski, Goce; Mischak, Harald; Vlahou, Antonia; Jankowski, Joachim
2015-01-01
Proteome analysis of complex biological samples for biomarker identification remains challenging, among others due to the extended range of protein concentrations. High-abundance proteins like albumin or IgG of plasma and urine, may interfere with the detection of potential disease biomarkers. Currently, several options are available for the depletion of abundant proteins in plasma. However, the applicability of these methods in urine has not been thoroughly investigated. In this study, we compared different, commercially available immunodepletion and ion-exchange based approaches on urine samples from both healthy subjects and CKD patients, for their reproducibility and efficiency in protein depletion. A starting urine volume of 500 μL was used to simulate conditions of a multi-institutional biomarker discovery study. All depletion approaches showed satisfactory reproducibility (n=5) in protein identification as well as protein abundance. Comparison of the depletion efficiency between the unfractionated and fractionated samples and the different depletion strategies, showed efficient depletion in all cases, with the exception of the ion-exchange kit. The depletion efficiency was found slightly higher in normal than in CKD samples and normal samples yielded more protein identifications than CKD samples when using both initial as well as corresponding depleted fractions. Along these lines, decrease in the amount of albumin and other targets as applicable, following depletion, was observed. Nevertheless, these depletion strategies did not yield a higher number of identifications in neither the urine from normal nor CKD patients. Collectively, when analyzing urine in the context of CKD biomarker identification, no added value of depletion strategies can be observed and analysis of unfractionated starting urine appears to be preferable. PMID:26208298
Bailey, Ulla-Maja; Schulz, Benjamin L
2013-04-01
Post-translational modification of proteins with glycosylation is of key importance in many biological systems in eukaryotes, influencing fundamental biological processes and regulating protein function. Changes in glycosylation are therefore of interest in understanding these processes and are also useful as clinical biomarkers of disease. The presence of glycosylation can also inhibit protease digestion and lower the quality and confidence of protein identification by mass spectrometry. While deglycosylation can improve the efficiency of subsequent protease digest and increase protein coverage, this step is often excluded from proteomic workflows. Here, we performed a systematic analysis that showed that deglycosylation with peptide-N-glycosidase F (PNGase F) prior to protease digestion with AspN or trypsin improved the quality of identification of the yeast cell wall proteome. The improvement in the confidence of identification of glycoproteins following PNGase F deglycosylation correlated with a higher density of glycosylation sites. Optimal identification across the proteome was achieved with PNGase F deglycosylation and complementary proteolysis with either AspN or trypsin. We used this combination of deglycosylation and complementary protease digest to identify changes in the yeast cell wall proteome caused by lack of the Alg3p protein, a key component of the biosynthetic pathway of protein N-glycosylation. The cell wall of yeast lacking Alg3p showed specifically increased levels of Cis3p, a protein important for cell wall integrity. Our results showed that deglycosylation prior to protease digestion improved the quality of proteomic analyses even if protein glycosylation is not of direct relevance to the study at hand. Copyright © 2013 Elsevier B.V. All rights reserved.
McDougall, Carmel; Woodcroft, Ben J.
2016-01-01
In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider. PMID:27415783
Identification of immunodominant proteins of the microalgae Prototheca by proteomic analysis
Irrgang, A.; Weise, C.; Murugaiyan, J.; Roesler, U.
2014-01-01
Prototheca zopfii associated with bovine mastitis and human protothecosis exists as two genotypes, of which genotype 1 is considered as non-infectious and genotype 2 as infectious. The mechanism of infection has not yet been described. The present study was aimed to identify genotype 2-specific immunodominant proteins. Prototheca proteins were separated using two-dimensional gel electrophoresis. Subsequent western blotting with rabbit hyperimmune serum revealed 28 protein spots. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry analysis resulted in the identification of 15 proteins including malate dehydrogenase, elongation factor 1-alpha, heat shock protein 70, and 14-3-3 protein, which were previously described as immunogenic proteins of other eukaryotic pathogens. PMID:25755891
APBSmem: A Graphical Interface for Electrostatic Calculations at the Membrane
Callenberg, Keith M.; Choudhary, Om P.; de Forest, Gabriel L.; Gohara, David W.; Baker, Nathan A.; Grabe, Michael
2010-01-01
Electrostatic forces are one of the primary determinants of molecular interactions. They help guide the folding of proteins, increase the binding of one protein to another and facilitate protein-DNA and protein-ligand binding. A popular method for computing the electrostatic properties of biological systems is to numerically solve the Poisson-Boltzmann (PB) equation, and there are several easy-to-use software packages available that solve the PB equation for soluble proteins. Here we present a freely available program, called APBSmem, for carrying out these calculations in the presence of a membrane. The Adaptive Poisson-Boltzmann Solver (APBS) is used as a back-end for solving the PB equation, and a Java-based graphical user interface (GUI) coordinates a set of routines that introduce the influence of the membrane, determine its placement relative to the protein, and set the membrane potential. The software Jmol is embedded in the GUI to visualize the protein inserted in the membrane before the calculation and the electrostatic potential after completing the computation. We expect that the ease with which the GUI allows one to carry out these calculations will make this software a useful resource for experimenters and computational researchers alike. Three examples of membrane protein electrostatic calculations are carried out to illustrate how to use APBSmem and to highlight the different quantities of interest that can be calculated. PMID:20949122
APBSmem: a graphical interface for electrostatic calculations at the membrane.
Callenberg, Keith M; Choudhary, Om P; de Forest, Gabriel L; Gohara, David W; Baker, Nathan A; Grabe, Michael
2010-09-29
Electrostatic forces are one of the primary determinants of molecular interactions. They help guide the folding of proteins, increase the binding of one protein to another and facilitate protein-DNA and protein-ligand binding. A popular method for computing the electrostatic properties of biological systems is to numerically solve the Poisson-Boltzmann (PB) equation, and there are several easy-to-use software packages available that solve the PB equation for soluble proteins. Here we present a freely available program, called APBSmem, for carrying out these calculations in the presence of a membrane. The Adaptive Poisson-Boltzmann Solver (APBS) is used as a back-end for solving the PB equation, and a Java-based graphical user interface (GUI) coordinates a set of routines that introduce the influence of the membrane, determine its placement relative to the protein, and set the membrane potential. The software Jmol is embedded in the GUI to visualize the protein inserted in the membrane before the calculation and the electrostatic potential after completing the computation. We expect that the ease with which the GUI allows one to carry out these calculations will make this software a useful resource for experimenters and computational researchers alike. Three examples of membrane protein electrostatic calculations are carried out to illustrate how to use APBSmem and to highlight the different quantities of interest that can be calculated.
Meereis, Florian; Kaufmann, Michael
2004-10-15
The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
Zhou, Yanting; Gao, Jing; Zhu, Hongwen; Xu, Jingjing; He, Han; Gu, Lei; Wang, Hui; Chen, Jie; Ma, Danjun; Zhou, Hu; Zheng, Jing
2018-02-20
Membrane proteins may act as transporters, receptors, enzymes, and adhesion-anchors, accounting for nearly 70% of pharmaceutical drug targets. Difficulties in efficient enrichment, extraction, and solubilization still exist because of their relatively low abundance and poor solubility. A simplified membrane protein extraction approach with advantages of user-friendly sample processing procedures, good repeatability and significant effectiveness was developed in the current research for enhancing enrichment and identification of membrane proteins. This approach combining centrifugation and detergent along with LC-MS/MS successfully identified higher proportion of membrane proteins, integral proteins and transmembrane proteins in membrane fraction (76.6%, 48.1%, and 40.6%) than in total cell lysate (41.6%, 16.4%, and 13.5%), respectively. Moreover, our method tended to capture membrane proteins with high degree of hydrophobicity and number of transmembrane domains as 486 out of 2106 (23.0%) had GRAVY > 0 in membrane fraction, 488 out of 2106 (23.1%) had TMs ≥ 2. It also provided for improved identification of membrane proteins as more than 60.6% of the commonly identified membrane proteins in two cell samples were better identified in membrane fraction with higher sequence coverage. Data are available via ProteomeXchange with identifier PXD008456.
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Mass spectrometry compatible surfactant for optimized in-gel protein digestion.
Saveliev, Sergei V; Woodroofe, Carolyn C; Sabat, Grzegorz; Adams, Christopher M; Klaubert, Dieter; Wood, Keith; Urh, Marjeta
2013-01-15
Identification of proteins resolved by SDS-PAGE depends on robust in-gel protein digestion and efficient peptide extraction, requirements that are often difficult to achieve. A lengthy and laborious procedure is an additional challenge of protein identification in gel. We show here that with the use of the mass spectrometry compatible surfactant sodium 3-((1-(furan-2-yl)undecyloxy)carbonylamino)propane-1-sulfonate, the challenges of in-gel protein digestion are effectively addressed. Peptide quantitation based on stable isotope labeling showed that the surfactant induced 1.5-2 fold increase in peptide recovery. Consequently, protein sequence coverage was increased by 20-30%, on average, and the number of identified proteins saw a substantial boost. The surfactant also accelerated the digestion process. Maximal in-gel digestion was achieved in as little as one hour, depending on incubation temperature, and peptides were readily recovered from gel eliminating the need for postdigestion extraction. This study shows that the surfactant provides an efficient means of improving protein identification in gel and streamlining the in-gel digestion procedure requiring no extra handling steps or special equipment.
Pooled protein immunization for identification of cell surface antigens in Streptococcus sanguinis.
Ge, Xiuchun; Kitten, Todd; Munro, Cindy L; Conrad, Daniel H; Xu, Ping
2010-07-26
Available bacterial genomes provide opportunities for screening vaccines by reverse vaccinology. Efficient identification of surface antigens is required to reduce time and animal cost in this technology. We developed an approach to identify surface antigens rapidly in Streptococcus sanguinis, a common infective endocarditis causative species. We applied bioinformatics for antigen prediction and pooled antigens for immunization. Forty-seven surface-exposed proteins including 28 lipoproteins and 19 cell wall-anchored proteins were chosen based on computer algorithms and comparative genomic analyses. Eight proteins among these candidates and 2 other proteins were pooled together to immunize rabbits. The antiserum reacted strongly with each protein and with S. sanguinis whole cells. Affinity chromatography was used to purify the antibodies to 9 of the antigen pool components. Competitive ELISA and FACS results indicated that these 9 proteins were exposed on S. sanguinis cell surfaces. The purified antibodies had demonstrable opsonic activity. The results indicate that immunization with pooled proteins, in combination with affinity purification, and comprehensive immunological assays may facilitate cell surface antigen identification to combat infectious diseases.
Pooled Protein Immunization for Identification of Cell Surface Antigens in Streptococcus sanguinis
Ge, Xiuchun; Kitten, Todd; Munro, Cindy L.; Conrad, Daniel H.; Xu, Ping
2010-01-01
Background Available bacterial genomes provide opportunities for screening vaccines by reverse vaccinology. Efficient identification of surface antigens is required to reduce time and animal cost in this technology. We developed an approach to identify surface antigens rapidly in Streptococcus sanguinis, a common infective endocarditis causative species. Methods and Findings We applied bioinformatics for antigen prediction and pooled antigens for immunization. Forty-seven surface-exposed proteins including 28 lipoproteins and 19 cell wall-anchored proteins were chosen based on computer algorithms and comparative genomic analyses. Eight proteins among these candidates and 2 other proteins were pooled together to immunize rabbits. The antiserum reacted strongly with each protein and with S. sanguinis whole cells. Affinity chromatography was used to purify the antibodies to 9 of the antigen pool components. Competitive ELISA and FACS results indicated that these 9 proteins were exposed on S. sanguinis cell surfaces. The purified antibodies had demonstrable opsonic activity. Conclusions The results indicate that immunization with pooled proteins, in combination with affinity purification, and comprehensive immunological assays may facilitate cell surface antigen identification to combat infectious diseases. PMID:20668678
Mukherjee, Somaditya; Jagadeeshaprasad, Mashanipalya G; Banerjee, Tanima; Ghosh, Sudip K; Biswas, Monodeep; Dutta, Santanu; Kulkarni, Mahesh J; Pattari, Sanjib; Bandyopadhyay, Arun
2014-01-01
Rheumatic fever in childhood is the most common cause of Mitral Stenosis in developing countries. The disease is characterized by damaged and deformed mitral valves predisposing them to scarring and narrowing (stenosis) that results in left atrial hypertrophy followed by heart failure. Presently, echocardiography is the main imaging technique used to diagnose Mitral Stenosis. Despite the high prevalence and increased morbidity, no biochemical indicators are available for prediction, diagnosis and management of the disease. Adopting a proteomic approach to study Rheumatic Mitral Stenosis may therefore throw some light in this direction. In our study, we undertook plasma proteomics of human subjects suffering from Rheumatic Mitral Stenosis (n = 6) and Control subjects (n = 6). Six plasma samples, three each from the control and patient groups were pooled and subjected to low abundance protein enrichment. Pooled plasma samples (crude and equalized) were then subjected to in-solution trypsin digestion separately. Digests were analyzed using nano LC-MS(E). Data was acquired with the Protein Lynx Global Server v2.5.2 software and searches made against reviewed Homo sapiens database (UniProtKB) for protein identification. Label-free protein quantification was performed in crude plasma only. A total of 130 proteins spanning 9-192 kDa were identified. Of these 83 proteins were common to both groups and 34 were differentially regulated. Functional annotation of overlapping and differential proteins revealed that more than 50% proteins are involved in inflammation and immune response. This was corroborated by findings from pathway analysis and histopathological studies on excised tissue sections of stenotic mitral valves. Verification of selected protein candidates by immunotechniques in crude plasma corroborated our findings from label-free protein quantification. We propose that this protein profile of blood plasma, or any of the individual proteins, could serve as a focal point for future mechanistic studies on Mitral Stenosis. In addition, some of the proteins associated with this disorder may be candidate biomarkers for disease diagnosis and prognosis. Our findings might help to enrich existing knowledge on the molecular mechanisms involved in Mitral Stenosis and improve the current diagnostic tools in the long run.
NASA Astrophysics Data System (ADS)
Sarsby, Joscelyn; Martin, Nicholas J.; Lalor, Patricia F.; Bunch, Josephine; Cooper, Helen J.
2014-09-01
Liquid extraction surface analysis mass spectrometry (LESA MS) has the potential to become a useful tool in the spatially-resolved profiling of proteins in substrates. Here, the approach has been applied to the analysis of thin tissue sections from human liver. The aim was to determine whether LESA MS was a suitable approach for the detection of protein biomarkers of nonalcoholic liver disease (nonalcoholic steatohepatitis, NASH), with a view to the eventual development of LESA MS for imaging NASH pathology. Two approaches were considered. In the first, endogenous proteins were extracted from liver tissue sections by LESA, subjected to automated trypsin digestion, and the resulting peptide mixture was analyzed by liquid chromatography tandem mass spectrometry (LC-MS/MS) (bottom-up approach). In the second (top-down approach), endogenous proteins were extracted by LESA, and analyzed intact. Selected protein ions were subjected to collision-induced dissociation (CID) and/or electron transfer dissociation (ETD) mass spectrometry. The bottom-up approach resulted in the identification of over 500 proteins; however identification of key protein biomarkers, liver fatty acid binding protein (FABP1), and its variant (Thr→Ala, position 94), was unreliable and irreproducible. Top-down LESA MS analysis of healthy and diseased liver tissue revealed peaks corresponding to multiple (~15-25) proteins. MS/MS of four of these proteins identified them as FABP1, its variant, α-hemoglobin, and 10 kDa heat shock protein. The reliable identification of FABP1 and its variant by top-down LESA MS suggests that the approach may be suitable for imaging NASH pathology in sections from liver biopsies.
Park, Gun Wook; Hwang, Heeyoun; Kim, Kwang Hoe; Lee, Ju Yeon; Lee, Hyun Kyoung; Park, Ji Yeong; Ji, Eun Sun; Park, Sung-Kyu Robin; Yates, John R; Kwon, Kyung-Hoon; Park, Young Mok; Lee, Hyoung-Joo; Paik, Young-Ki; Kim, Jin Young; Yoo, Jong Shin
2016-11-04
In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
Sarsby, Joscelyn; Martin, Nicholas J; Lalor, Patricia F; Bunch, Josephine; Cooper, Helen J
2014-11-01
Liquid extraction surface analysis mass spectrometry (LESA MS) has the potential to become a useful tool in the spatially-resolved profiling of proteins in substrates. Here, the approach has been applied to the analysis of thin tissue sections from human liver. The aim was to determine whether LESA MS was a suitable approach for the detection of protein biomarkers of nonalcoholic liver disease (nonalcoholic steatohepatitis, NASH), with a view to the eventual development of LESA MS for imaging NASH pathology. Two approaches were considered. In the first, endogenous proteins were extracted from liver tissue sections by LESA, subjected to automated trypsin digestion, and the resulting peptide mixture was analyzed by liquid chromatography tandem mass spectrometry (LC-MS/MS) (bottom-up approach). In the second (top-down approach), endogenous proteins were extracted by LESA, and analyzed intact. Selected protein ions were subjected to collision-induced dissociation (CID) and/or electron transfer dissociation (ETD) mass spectrometry. The bottom-up approach resulted in the identification of over 500 proteins; however identification of key protein biomarkers, liver fatty acid binding protein (FABP1), and its variant (Thr→Ala, position 94), was unreliable and irreproducible. Top-down LESA MS analysis of healthy and diseased liver tissue revealed peaks corresponding to multiple (~15-25) proteins. MS/MS of four of these proteins identified them as FABP1, its variant, α-hemoglobin, and 10 kDa heat shock protein. The reliable identification of FABP1 and its variant by top-down LESA MS suggests that the approach may be suitable for imaging NASH pathology in sections from liver biopsies.
NASA Astrophysics Data System (ADS)
Diaz, K. S.; Kim, E. H.; Jones, R. M.; de Leon, K. C.; Woodcroft, B. J.; Tyson, G. W.; Rich, V. I.
2014-12-01
The growing field of metaproteomics links microbial communities to their expressed functions by using mass spectrometry methods to characterize community proteins. Comparison of mass spectrometry protein search algorithms and their biases is crucial for maximizing the quality and amount of protein identifications in mass spectral data. Available algorithms employ different approaches when mapping mass spectra to peptides against a database. We compared mass spectra from four microbial proteomes derived from high-organic content soils searched with two search algorithms: 1) Sequest HT as packaged within Proteome Discoverer (v.1.4) and 2) X!Tandem as packaged in TransProteomicPipeline (v.4.7.1). Searches used matched metagenomes, and results were filtered to allow identification of high probability proteins. There was little overlap in proteins identified by both algorithms, on average just ~24% of the total. However, when adjusted for spectral abundance, the overlap improved to ~70%. Proteome Discoverer generally outperformed X!Tandem, identifying an average of 12.5% more proteins than X!Tandem, with X!Tandem identifying more proteins only in the first two proteomes. For spectrally-adjusted results, the algorithms were similar, with X!Tandem marginally outperforming Proteome Discoverer by an average of ~4%. We then assessed differences in heat shock proteins (HSP) identification by the two algorithms by BLASTing identified proteins against the Heat Shock Protein Information Resource, because HSP hits typically account for the majority signal in proteomes, due to extraction protocols. Total HSP identifications for each of the 4 proteomes were approximately ~15%, ~11%, ~17%, and ~19%, with ~14% for total HSPs with redundancies removed. Of the ~15% average of proteins from the 4 proteomes identified as HSPs, ~10% of proteins and spectra were identified by both algorithms. On average, Proteome Discoverer identified ~9% more HSPs than X!Tandem.
Welker, F
2018-02-20
The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.
Wojdyla, Justyna Aleksandra; Panepucci, Ezequiel; Martiel, Isabelle; Ebner, Simon; Huang, Chia-Ying; Caffrey, Martin; Bunk, Oliver; Wang, Meitian
2016-01-01
A fast continuous grid scan protocol has been incorporated into the Swiss Light Source (SLS) data acquisition and analysis software suite on the macromolecular crystallography (MX) beamlines. Its combination with fast readout single-photon counting hybrid pixel array detectors (PILATUS and EIGER) allows for diffraction-based identification of crystal diffraction hotspots and the location and centering of membrane protein microcrystals in the lipid cubic phase (LCP) in in meso in situ serial crystallography plates and silicon nitride supports. Diffraction-based continuous grid scans with both still and oscillation images are supported. Examples that include a grid scan of a large (50 nl) LCP bolus and analysis of the resulting diffraction images are presented. Scanning transmission X-ray microscopy (STXM) complements and benefits from fast grid scanning. STXM has been demonstrated at the SLS beamline X06SA for near-zero-dose detection of protein crystals mounted on different types of sample supports at room and cryogenic temperatures. Flash-cooled crystals in nylon loops were successfully identified in differential and integrated phase images. Crystals of just 10 µm thickness were visible in integrated phase images using data collected with the EIGER detector. STXM offers a truly low-dose method for locating crystals on solid supports prior to diffraction data collection at both synchrotron microfocusing and free-electron laser X-ray facilities. PMID:27275141
Saxena, Shalini; Abdullah, Maaged; Sriram, Dharmarajan; Guruprasad, Lalitha
2017-10-17
MurG (Rv2153c) is a key player in the biosynthesis of the peptidoglycan layer in Mycobacterium tuberculosis (Mtb). This work is an attempt to highlight the structural and functional relationship of Mtb MurG, the three-dimensional (3D) structure of protein was constructed by homology modelling using Discovery Studio 3.5 software. The quality and consistency of generated model was assessed by PROCHECK, ProSA and ERRAT. Later, the model was optimized by molecular dynamics (MD) simulations and the optimized model complex with substrate Uridine-diphosphate-N-acetylglucosamine (UD1) facilitated us to employ structure-based virtual screening approach to obtain new hits from Asinex database using energy-optimized pharmacophore modelling (e-pharmacophore). The pharmacophore model was validated using enrichment calculations, and finally, validated model was employed for high-throughput virtual screening and molecular docking to identify novel Mtb MurG inhibitors. This study led to the identification of 10 potential compounds with good fitness, docking score, which make important interactions with the protein active site. The 25 ns MD simulations of three potential lead compounds with protein confirmed that the structure was stable and make several non-bonding interactions with amino acids, such as Leu290, Met310 and Asn167. Hence, we concluded that the identified compounds may act as new leads for the design of Mtb MurG inhibitors.
Microcystin-LR Detected in a Low Molecular Weight Fraction from a Crude Extract of Zoanthus sociatus
Domínguez-Pérez, Dany; Alexei Rodríguez, Armando; Osorio, Hugo; Azevedo, Joana; Castañeda, Olga; Vasconcelos, Vítor; Antunes, Agostinho
2017-01-01
Cnidarian constitutes a great source of bioactive compounds. However, research involving peptides from organisms belonging to the order Zoanthidea has received very little attention, contrasting to the numerous studies of the order Actiniaria, from which hundreds of toxic peptides and proteins have been reported. In this work, we performed a mass spectrometry analysis of a low molecular weight (LMW) fraction previously reported as lethal to mice. The low molecular weight (LMW) fraction was obtained by gel filtration of a Zoanthus sociatus (order Zoanthidea) crude extract with a Sephadex G-50, and then analyzed by matrix-assisted laser desorption/ionization time-of-flight/time-of-flight (MALDI-TOF/TOF) mass spectrometry (MS) in positive ion reflector mode from m/z 700 to m/z 4000. Afterwards, some of the most intense and representative MS ions were fragmented by MS/MS with no significant results obtained by Protein Pilot protein identification software and the Mascot algorithm search. However, microcystin masses were detected by mass-matching against libraries of non-ribosomal peptide database (NORINE). Subsequent reversed-phase C18 HPLC (in isocratic elution mode) and mass spectrometry analyses corroborated the presence of the cyanotoxin Microcystin-LR (MC-LR). To the best of our knowledge, this finding constitutes the first report of MC-LR in Z. sociatus, and one of the few evidences of such cyanotoxin in cnidarians. PMID:28257074
Rose, Annkatrin; Manikantan, Sankaraganesh; Schraegle, Shannon J.; Maloy, Michael A.; Stahlberg, Eric A.; Meier, Iris
2004-01-01
Increasing evidence demonstrates the importance of long coiled-coil proteins for the spatial organization of cellular processes. Although several protein classes with long coiled-coil domains have been studied in animals and yeast, our knowledge about plant long coiled-coil proteins is very limited. The repeat nature of the coiled-coil sequence motif often prevents the simple identification of homologs of animal coiled-coil proteins by generic sequence similarity searches. As a consequence, counterparts of many animal proteins with long coiled-coil domains, like lamins, golgins, or microtubule organization center components, have not been identified yet in plants. Here, all Arabidopsis proteins predicted to contain long stretches of coiled-coil domains were identified by applying the algorithm MultiCoil to a genome-wide screen. A searchable protein database, ARABI-COIL (http://www.coiled-coil.org/arabidopsis), was established that integrates information on number, size, and position of predicted coiled-coil domains with subcellular localization signals, transmembrane domains, and available functional annotations. ARABI-COIL serves as a tool to sort and browse Arabidopsis long coiled-coil proteins to facilitate the identification and selection of candidate proteins of potential interest for specific research areas. Using the database, candidate proteins were identified for Arabidopsis membrane-bound, nuclear, and organellar long coiled-coil proteins. PMID:15020757
Abele-Horn, Marianne; Hommers, Leif; Trabold, René; Frosch, Matthias
2006-01-01
We evaluated the ability of the new VITEK 2 version 4.01 software to identify and detect glycopeptide-resistant enterococci compared to that of the reference broth microdilution method and to classify them into the vanA, vanB, vanC1, and vanC2 genotypes. Moreover, the accuracy of antimicrobial susceptibility testing with agents with improved potencies against glycopeptide-resistant enterococci was determined. A total of 121 enterococci were investigated. The new VITEK 2 software was able to identify 114 (94.2%) enterococcal strains correctly to the species level and to classify 119 (98.3%) enterococci correctly to the glycopeptide resistance genotype level. One Enterococcus casseliflavus strain and six Enterococcus faecium vanA strains with low-level resistance to vancomycin were identified with low discrimination, requiring additional tests. One of the vanA strains was misclassified as the vanB type, and one glycopeptide-susceptible E. facium wild type was misclassified as the vanA type. The overall essential agreements for antimicrobial susceptibility testing results were 94.2% for vancomycin, 95.9% for teicoplanin, 100% for quinupristin-dalfopristin and moxifloxacin, and 97.5% for linezolid. The rates of minor errors were 9% for teicoplanin and 5% for the other antibiotic agents. The identification and susceptibility data were produced within 4 h to 6 h 30 min and 8 h 15 min to 12 h 15 min. In conclusion, use of VITEK 2 version 4.01 software appears to be a reliable method for the identification and detection of glycopeptide-resistant enterococci as well as an improvement over the use of the former VITEK 2 database. However, a significant reduction in the detection time would be desirable. PMID:16390951
Can dead man tooth do tell tales? Tooth prints in forensic identification.
Christopher, Vineetha; Murthy, Sarvani; Ashwinirani, S R; Prasad, Kulkarni; Girish, Suragimath; Vinit, Shashikanth Patil
2017-01-01
We know that teeth trouble us a lot when we are alive, but they last longer for thousands of years even after we are dead. Teeth being the strongest and resistant structure are the most significant tool in forensic investigations. Patterns of enamel rod end on the tooth surface are known as tooth prints. This study is aimed to know whether these tooth prints can become a forensic tool in personal identification such as finger prints. A study has been targeted toward the same. In the present in-vivo study, acetate peel technique has been used to obtain the replica of enamel rod end patterns. Tooth prints of upper first premolars were recorded from 80 individuals after acid etching using cellulose acetate strips. Then, digital images of the tooth prints obtained at two different intervals were subjected to biometric conversion using Verifinger standard software development kit version 6.5 software followed by the use of Automated Fingerprint Identification System (AFIS) software for comparison of the tooth prints. Similarly, each individual's finger prints were also recorded and were subjected to the same software. Further, recordings of AFIS scores obtained from images were statistically analyzed using Cronbach's test. We observed that comparing two tooth prints taken from an individual at two intervals exhibited similarity in many cases, with wavy pattern tooth print being the predominant type. However, the same prints showed dissimilarity when compared with other individuals. We also found that most of the individuals with whorl pattern finger print showed wavy pattern tooth print and few loop type fingerprints showed linear pattern of tooth prints. Further more experiments on both tooth prints and finger prints are required in establishing an individual's identity.
Can dead man tooth do tell tales? Tooth prints in forensic identification
Christopher, Vineetha; Murthy, Sarvani; Ashwinirani, S. R.; Prasad, Kulkarni; Girish, Suragimath; Vinit, Shashikanth Patil
2017-01-01
Background: We know that teeth trouble us a lot when we are alive, but they last longer for thousands of years even after we are dead. Teeth being the strongest and resistant structure are the most significant tool in forensic investigations. Patterns of enamel rod end on the tooth surface are known as tooth prints. Aim: This study is aimed to know whether these tooth prints can become a forensic tool in personal identification such as finger prints. A study has been targeted toward the same. Settings and Design: In the present in-vivo study, acetate peel technique has been used to obtain the replica of enamel rod end patterns. Materials and Methods: Tooth prints of upper first premolars were recorded from 80 individuals after acid etching using cellulose acetate strips. Then, digital images of the tooth prints obtained at two different intervals were subjected to biometric conversion using Verifinger standard software development kit version 6.5 software followed by the use of Automated Fingerprint Identification System (AFIS) software for comparison of the tooth prints. Similarly, each individual's finger prints were also recorded and were subjected to the same software. Statistical Analysis: Further, recordings of AFIS scores obtained from images were statistically analyzed using Cronbach's test. Results: We observed that comparing two tooth prints taken from an individual at two intervals exhibited similarity in many cases, with wavy pattern tooth print being the predominant type. However, the same prints showed dissimilarity when compared with other individuals. We also found that most of the individuals with whorl pattern finger print showed wavy pattern tooth print and few loop type fingerprints showed linear pattern of tooth prints. Conclusions: Further more experiments on both tooth prints and finger prints are required in establishing an individual's identity. PMID:28584483
Filho, Herton Luiz Alves Sales; da Mata Sousa, Luiz Claudio Demes; von Glehn, Cristina de Queiroz Carrascosa; da Silva, Adalberto Socorro; dos Santos Neto, Pedro de Alcântara; do Nascimento, Ferraz; de Castro, Adail Fonseca; do Nascimento, Liliane Machado; Kneib, Carolina; Bianchi Cazarote, Helena; Mayumi Kitamura, Daniele; Torres, Juliane Roberta Dias; da Cruz Lopes, Laiane; Barros, Aryela Loureiro; da Silva Edlin, Evelin Nildiane; de Moura, Fernanda Sá Leal; Watanabe, Janine Midori Figueiredo; do Monte, Semiramis Jamil Hadad
2012-06-01
The HLAMatchmaker algorithm, which allows the identification of “safe” acceptable mismatches (AMMs) for recipients of solid organ and cell allografts, is rarely used in part due to the difficulty in using it in the current Excel format. The automation of this algorithm may universalize its use to benefit the allocation of allografts. Recently, we have developed a new software called EpHLA, which is the first computer program automating the use of the HLAMatchmaker algorithm. Herein, we present the experimental validation of the EpHLA program by showing the time efficiency and the quality of operation. The same results, obtained by a single antigen bead assay with sera from 10 sensitized patients waiting for kidney transplants, were analyzed either by conventional HLAMatchmaker or by automated EpHLA method. Users testing these two methods were asked to record: (i) time required for completion of the analysis (in minutes); (ii) number of eplets obtained for class I and class II HLA molecules; (iii) categorization of eplets as reactive or non-reactive based on the MFI cutoff value; and (iv) determination of AMMs based on eplets' reactivities. We showed that although both methods had similar accuracy, the automated EpHLA method was over 8 times faster in comparison to the conventional HLAMatchmaker method. In particular the EpHLA software was faster and more reliable but equally accurate as the conventional method to define AMMs for allografts. The EpHLA software is an accurate and quick method for the identification of AMMs and thus it may be a very useful tool in the decision-making process of organ allocation for highly sensitized patients as well as in many other applications.
HIPPI: highly accurate protein family classification with ensembles of HMMs.
Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy
2016-11-11
Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias
2017-11-10
Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
The Metaproteome of "Park Grass" soil - a reference for EU soil science
NASA Astrophysics Data System (ADS)
Quinn, Gerry; Dudley, Ed; Doerr, Stefan; Matthews, Peter; Halen, Ingrid; Walley, Richard; Ashton, Rhys; Delmont, Tom; Francis, Lewis; Gazze, Salvatore Andrea; Van Keulen, Geertje
2016-04-01
Soil metaproteomics, the systemic extraction and identification of proteins from a soil, is key to understanding the biological and physical processes that occur within the soil at a molecular level. Until recently, direct extraction of proteins from complex soils have yielded only dozens of protein identifications due to interfering substances, such as humic acids and clay, which co-extract and/or strongly adsorb protein, often causing problems in downstream processing, e.g. mass spectrometry. Furthermore, the current most successful, direct, proteomic extraction protocol favours larger molecular weight and/or heat-stable proteins due to its extraction protocol. We have now developed a novel, faster, direct soil protein extraction protocol which also addressed the problem of interfering substances, while only requiring less than 1 gram of material per extraction. We extracted protein from the 'Genomic Observatory' Park Grass at Rothamsted Research (UK), an ideally suited geographic site as it is the longest (>150 years) continually studied experiment on ungrazed permanent grassland in the world, for which a rich history of environmental/ecological data has been collected, including high quality publically available metagenome DNA sequences. Using this improved methodology, in conjunction with the creation of high quality, curated metagenomic sequence databases, we have been able to significantly improve protein identifications from one soil due to extracting a similar number of proteins that were >90% different when compared to the best current direct protocol. This optimised metaproteomics protocol has now enabled identification of thousands of proteins from one soil, leading therefore to a deeper insight of soil system processes at the molecular scale.
Proteomic analysis of human aqueous humor using multidimensional protein identification technology
Richardson, Matthew R.; Price, Marianne O.; Price, Francis W.; Pardo, Jennifer C.; Grandin, Juan C.; You, Jinsam; Wang, Mu
2009-01-01
Aqueous humor (AH) supports avascular tissues in the anterior segment of the eye, maintains intraocular pressure, and potentially influences the pathogenesis of ocular diseases. Nevertheless, the AH proteome is still poorly defined despite several previous efforts, which were hindered by interfering high abundance proteins, inadequate animal models, and limited proteomic technologies. To facilitate future investigations into AH function, the AH proteome was extensively characterized using an advanced proteomic approach. Samples from patients undergoing cataract surgery were pooled and depleted of interfering abundant proteins and thereby divided into two fractions: albumin-bound and albumin-depleted. Multidimensional Protein Identification Technology (MudPIT) was utilized for each fraction; this incorporates strong cation exchange chromatography to reduce sample complexity before reversed-phase liquid chromatography and tandem mass spectrometric analysis. Twelve proteins had multi-peptide, high confidence identifications in the albumin-bound fraction and 50 proteins had multi-peptide, high confidence identifications in the albumin-depleted fraction. Gene ontological analyses were performed to determine which cellular components and functions were enriched. Many proteins were previously identified in the AH and for several their potential role in the AH has been investigated; however, the majority of identified proteins were novel and only speculative roles can be suggested. The AH was abundant in anti-oxidant and immunoregulatory proteins as well as anti-angiogenic proteins, which may be involved in maintaining the avascular tissues. This is the first known report to extensively characterize and describe the human AH proteome and lays the foundation for future work regarding its function in homeostatic and pathologic states. PMID:20019884
The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.
Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel
2015-10-02
As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications.
A Construct for Describing Software Development Risks
1994-07-01
consequences during any risk identification process. It is more important and expedient to capture the conditions since the basic statement of conse ...chain of causal events. The contrast between the exploration of conditions rather than conse - quences in risk identification is similar to that of...extent that a general con - dition has a multiplicity of individual characteristics. In the CTC representation, these distinct risks can share common
Investigations into the Properties, Conditions, and Effects of the Ionosphere
1990-01-15
ionogram database to be used in testing trace-identification algorithms; d. Development of automatic trace-identification algorithms and autoscaling ...Scaler ( ARTIST ) and improvement of the ARTIST software; g. Maintenance and upgrade of the digital ionosondes at Argentia, Newfoundland, and Goose Bay...provided by the contractor; j. Upgrade of the ARTIST computer at the Danish Meteorological Institute/GL Qaanaaq site to provide digisonde tape-playback
Computational Methods for Identification, Optimization and Control of PDE Systems
2010-04-30
focused on the development of numerical methods and software specifically for the purpose of solving control, design, and optimization prob- lems where...that provide the foundations of simulation software must play an important role in any research of this type, the demands placed on numerical methods...y sus Aplicaciones , Ciudad de Cor- doba - Argentina, October 2007. 3. Inverse Problems in Deployable Space Structures, Fourth Conference on Inverse
NASA Technical Reports Server (NTRS)
Nez, G. (Principal Investigator); Mutter, D.
1977-01-01
The author has identified the following significant results. New LANDSAT analysis software and linkages with other computer mapping software were developed. Significant results were also achieved in training, communication, and identification of needs for developing the LANDSAT/computer mapping technologies into operational tools for use by decision makers.
Parallels in Computer-Aided Design Framework and Software Development Environment Efforts.
1992-05-01
de - sign kits, and tool and design management frameworks. Also, books about software engineer- ing environments [Long 91] and electronic design...tool integration [Zarrella 90], and agreement upon a universal de - sign automation framework, such as the CAD Framework Initiative (CFI) [Malasky 91...ments: identification, control, status accounting, and audit and review. The paper by Dart ex- tracts 15 CM concepts from existing SDEs and tools
Radio frequency tags systems to initiate system processing
NASA Astrophysics Data System (ADS)
Madsen, Harold O.; Madsen, David W.
1994-09-01
This paper describes the automatic identification technology which has been installed at Applied Magnetic Corp. MR fab. World class manufacturing requires technology exploitation. This system combines (1) FluoroTrac cassette and operator tracking, (2) CELLworks cell controller software tools, and (3) Auto-Soft Inc. software integration services. The combined system eliminates operator keystrokes and errors during normal processing within a semiconductor fab. The methods and benefits of this system are described.
Use of Facial Recognition Software to Identify Disaster Victims With Facial Injuries.
Broach, John; Yong, Rothsovann; Manuell, Mary-Elise; Nichols, Constance
2017-10-01
After large-scale disasters, victim identification frequently presents a challenge and a priority for responders attempting to reunite families and ensure proper identification of deceased persons. The purpose of this investigation was to determine whether currently commercially available facial recognition software can successfully identify disaster victims with facial injuries. Photos of 106 people were taken before and after application of moulage designed to simulate traumatic facial injuries. These photos as well as photos from volunteers' personal photo collections were analyzed by using facial recognition software to determine whether this technology could accurately identify a person with facial injuries. The study results suggest that a responder could expect to get a correct match between submitted photos and photos of injured patients between 39% and 45% of the time and a much higher percentage of correct returns if submitted photos were of optimal quality with percentages correct exceeding 90% in most situations. The present results suggest that the use of this software would provide significant benefit to responders. Although a correct result was returned only 40% of the time, this would still likely represent a benefit for a responder trying to identify hundreds or thousands of victims. (Disaster Med Public Health Preparedness. 2017;11:568-572).
Validation of DNA-based identification software by computation of pedigree likelihood ratios.
Slooten, K
2011-08-01
Disaster victim identification (DVI) can be aided by DNA-evidence, by comparing the DNA-profiles of unidentified individuals with those of surviving relatives. The DNA-evidence is used optimally when such a comparison is done by calculating the appropriate likelihood ratios. Though conceptually simple, the calculations can be quite involved, especially with large pedigrees, precise mutation models etc. In this article we describe a series of test cases designed to check if software designed to calculate such likelihood ratios computes them correctly. The cases include both simple and more complicated pedigrees, among which inbred ones. We show how to calculate the likelihood ratio numerically and algebraically, including a general mutation model and possibility of allelic dropout. In Appendix A we show how to derive such algebraic expressions mathematically. We have set up these cases to validate new software, called Bonaparte, which performs pedigree likelihood ratio calculations in a DVI context. Bonaparte has been developed by SNN Nijmegen (The Netherlands) for the Netherlands Forensic Institute (NFI). It is available free of charge for non-commercial purposes (see www.dnadvi.nl for details). Commercial licenses can also be obtained. The software uses Bayesian networks and the junction tree algorithm to perform its calculations. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka
2014-11-15
The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
webPIPSA: a web server for the comparison of protein interaction properties
Richter, Stefan; Wenzel, Anne; Stein, Matthias; Gabdoulline, Razif R.; Wade, Rebecca C.
2008-01-01
Protein molecular interaction fields are key determinants of protein functionality. PIPSA (Protein Interaction Property Similarity Analysis) is a procedure to compare and analyze protein molecular interaction fields, such as the electrostatic potential. PIPSA may assist in protein functional assignment, classification of proteins, the comparison of binding properties and the estimation of enzyme kinetic parameters. webPIPSA is a web server that enables the use of PIPSA to compare and analyze protein electrostatic potentials. While PIPSA can be run with downloadable software (see http://projects.eml.org/mcm/software/pipsa), webPIPSA extends and simplifies a PIPSA run. This allows non-expert users to perform PIPSA for their protein datasets. With input protein coordinates, the superposition of protein structures, as well as the computation and analysis of electrostatic potentials, is automated. The results are provided as electrostatic similarity matrices from an all-pairwise comparison of the proteins which can be subjected to clustering and visualized as epograms (tree-like diagrams showing electrostatic potential differences) or heat maps. webPIPSA is freely available at: http://pipsa.eml.org. PMID:18420653
Interactive visualization of multi-data-set Rietveld analyses using Cinema:Debye-Scherrer.
Vogel, Sven C; Biwer, Chris M; Rogers, David H; Ahrens, James P; Hackenberg, Robert E; Onken, Drew; Zhang, Jianzhong
2018-06-01
A tool named Cinema:Debye-Scherrer to visualize the results of a series of Rietveld analyses is presented. The multi-axis visualization of the high-dimensional data sets resulting from powder diffraction analyses allows identification of analysis problems, prediction of suitable starting values, identification of gaps in the experimental parameter space and acceleration of scientific insight from the experimental data. The tool is demonstrated with analysis results from 59 U-Nb alloy samples with different compositions, annealing times and annealing temperatures as well as with a high-temperature study of the crystal structure of CsPbBr 3 . A script to extract parameters from a series of Rietveld analyses employing the widely used GSAS Rietveld software is also described. Both software tools are available for download.
Interactive visualization of multi-data-set Rietveld analyses using Cinema:Debye-Scherrer
Biwer, Chris M.; Rogers, David H.; Ahrens, James P.; Hackenberg, Robert E.; Onken, Drew; Zhang, Jianzhong
2018-01-01
A tool named Cinema:Debye-Scherrer to visualize the results of a series of Rietveld analyses is presented. The multi-axis visualization of the high-dimensional data sets resulting from powder diffraction analyses allows identification of analysis problems, prediction of suitable starting values, identification of gaps in the experimental parameter space and acceleration of scientific insight from the experimental data. The tool is demonstrated with analysis results from 59 U–Nb alloy samples with different compositions, annealing times and annealing temperatures as well as with a high-temperature study of the crystal structure of CsPbBr3. A script to extract parameters from a series of Rietveld analyses employing the widely used GSAS Rietveld software is also described. Both software tools are available for download. PMID:29896062
Detection of protein-protein interactions by ribosome display and protein in situ immobilisation.
He, Mingyue; Liu, Hong; Turner, Martin; Taussig, Michael J
2009-12-31
We describe a method for identification of protein-protein interactions by combining two cell-free protein technologies, namely ribosome display and protein in situ immobilisation. The method requires only PCR fragments as the starting material, the target proteins being made through cell-free protein synthesis, either associated with their encoding mRNA as ribosome complexes or immobilised on a solid surface. The use of ribosome complexes allows identification of interacting protein partners from their attached coding mRNA. To demonstrate the procedures, we have employed the lymphocyte signalling proteins Vav1 and Grb2 and confirmed the interaction between Grb2 and the N-terminal SH3 domain of Vav1. The method has promise for library screening of pairwise protein interactions, down to the analytical level of individual domain or motif mapping.
Zhang, Baixia; He, Shuaibing; Lv, Chenyang; Zhang, Yanling; Wang, Yun
2018-01-01
The identification of bioactive components in traditional Chinese medicine (TCM) is an important part of the TCM material foundation research. Recently, molecular docking technology has been extensively used for the identification of TCM bioactive components. However, target proteins that are used in molecular docking may not be the actual TCM target. For this reason, the bioactive components would likely be omitted or incorrect. To address this problem, this study proposed the GEPSI method that identified the target proteins of TCM based on the similarity of gene expression profiles. The similarity of the gene expression profiles affected by TCM and small molecular drugs was calculated. The pharmacological action of TCM may be similar to that of small molecule drugs that have a high similarity score. Indeed, the target proteins of the small molecule drugs could be considered TCM targets. Thus, we identified the bioactive components of a TCM by molecular docking and verified the reliability of this method by a literature investigation. Using the target proteins that TCM actually affected as targets, the identification of the bioactive components was more accurate. This study provides a fast and effective method for the identification of TCM bioactive components.
Zhang, Baixia; He, Shuaibing; Lv, Chenyang; Zhang, Yanling
2018-01-01
The identification of bioactive components in traditional Chinese medicine (TCM) is an important part of the TCM material foundation research. Recently, molecular docking technology has been extensively used for the identification of TCM bioactive components. However, target proteins that are used in molecular docking may not be the actual TCM target. For this reason, the bioactive components would likely be omitted or incorrect. To address this problem, this study proposed the GEPSI method that identified the target proteins of TCM based on the similarity of gene expression profiles. The similarity of the gene expression profiles affected by TCM and small molecular drugs was calculated. The pharmacological action of TCM may be similar to that of small molecule drugs that have a high similarity score. Indeed, the target proteins of the small molecule drugs could be considered TCM targets. Thus, we identified the bioactive components of a TCM by molecular docking and verified the reliability of this method by a literature investigation. Using the target proteins that TCM actually affected as targets, the identification of the bioactive components was more accurate. This study provides a fast and effective method for the identification of TCM bioactive components. PMID:29692857
Analysis of secreted proteins from Aspergillus flavus.
Medina, Martha L; Haynes, Paul A; Breci, Linda; Francisco, Wilson A
2005-08-01
MS/MS techniques in proteomics make possible the identification of proteins from organisms with little or no genome sequence information available. Peptide sequences are obtained from tandem mass spectra by matching peptide mass and fragmentation information to protein sequence information from related organisms, including unannotated genome sequence data. This peptide identification data can then be grouped and reconstructed into protein data. In this study, we have used this approach to study protein secretion by Aspergillus flavus, a filamentous fungus for which very little genome sequence information is available. A. flavus is capable of degrading the flavonoid rutin (quercetin 3-O-glycoside), as the only source of carbon via an extracellular enzyme system. In this continuing study, a proteomic analysis was used to identify secreted proteins from A. flavus when grown on rutin. The growth media glucose and potato dextrose were used to identify differentially expressed secreted proteins. The secreted proteins were analyzed by 1- and 2-DE and MS/MS. A total of 51 unique A. flavus secreted proteins were identified from the three growth conditions. Ten proteins were unique to rutin-, five to glucose- and one to potato dextrose-grown A. flavus. Sixteen secreted proteins were common to all three media. Fourteen identifications were of hypothetical proteins or proteins of unknown functions. To our knowledge, this is the first extensive proteomic study conducted to identify the secreted proteins from a filamentous fungus.